Annotation of Italian Dialogues following the Penn Discourse …pdtb2012/assets/... · 2012. 5....

40
PDTB Workshop, 04/30/2012 1 Sara Tonelli, Fondazione Bruno Kessler, Trento Annotation of Italian Dialogues following the Penn Discourse Treebank paradigm

Transcript of Annotation of Italian Dialogues following the Penn Discourse …pdtb2012/assets/... · 2012. 5....

Page 1: Annotation of Italian Dialogues following the Penn Discourse …pdtb2012/assets/... · 2012. 5. 24. · “This is not active either” 1 “So we can contact you at” 2 “BUT I’m

1 Sara Tonelli

PDTB Workshop, 04/30/2012

1

Sara Tonelli, Fondazione Bruno Kessler, Trento

Annotation of Italian Dialogues following the Penn Discourse Treebank paradigm

Page 2: Annotation of Italian Dialogues following the Penn Discourse …pdtb2012/assets/... · 2012. 5. 24. · “This is not active either” 1 “So we can contact you at” 2 “BUT I’m

2 Sara Tonelli

Activity Goal

•  LUNA project: spoken Language UNderstanding in multilinguAl communication systems

Page 3: Annotation of Italian Dialogues following the Penn Discourse …pdtb2012/assets/... · 2012. 5. 24. · “This is not active either” 1 “So we can contact you at” 2 “BUT I’m

3 Sara Tonelli

Activity Goal

•  LUNA project: spoken Language UNderstanding in multilinguAl communication systems

•  LUNA corpus: collection of 1000 Human-Human (HH) and Human-Machine(HM) dialogues of spontaneous speech, recorded at help-desk facility of a consortium for software/hardware assistance

Page 4: Annotation of Italian Dialogues following the Penn Discourse …pdtb2012/assets/... · 2012. 5. 24. · “This is not active either” 1 “So we can contact you at” 2 “BUT I’m

4 Sara Tonelli

Activity Goal

•  LUNA project: spoken Language UNderstanding in multilinguAl communication systems

•  LUNA corpus: collection of 1000 Human-Human (HH) and Human-Machine(HM) dialogues of spontaneous speech, recorded at help-desk facility of a consortium for software/hardware assistance

•  All conversations held in Italian

Page 5: Annotation of Italian Dialogues following the Penn Discourse …pdtb2012/assets/... · 2012. 5. 24. · “This is not active either” 1 “So we can contact you at” 2 “BUT I’m

5 Sara Tonelli

Activity Goal

•  LUNA project: spoken Language UNderstanding in multilinguAl communication systems

•  LUNA corpus: collection of 1000 Human-Human (HH) and Human-Machine(HM) dialogues of spontaneous speech, recorded at help-desk facility of a consortium for software/hardware assistance

•  All conversations held in Italian •  Goal: Investigate to what extent the PDTB annotation

scheme can be applied to dialogues in Italian

Page 6: Annotation of Italian Dialogues following the Penn Discourse …pdtb2012/assets/... · 2012. 5. 24. · “This is not active either” 1 “So we can contact you at” 2 “BUT I’m

6 Sara Tonelli

Activity Goal

•  LUNA project: spoken Language UNderstanding in multilinguAl communication systems

•  LUNA corpus: collection of 1000 Human-Human (HH) and Human-Machine(HM) dialogues of spontaneous speech, recorded at help-desk facility of a consortium for software/hardware assistance

•  All conversations held in Italian •  Goal: Investigate to what extent the PDTB annotation

scheme can be applied to dialogues in Italian •  Research focused on HH conversations, more complex

dialogues without a predefined structure

Page 7: Annotation of Italian Dialogues following the Penn Discourse …pdtb2012/assets/... · 2012. 5. 24. · “This is not active either” 1 “So we can contact you at” 2 “BUT I’m

7 Sara Tonelli

Human-Human Conversation IT Help Desk

U Hi Good Morning O Hi, How May I Help You? U I am Roberta Sicconi

calling from Cultural Affairs at City Hall.

U I had made a request for a password change yesterday

O Ok do you have the request track id?

U Uhm No I cannot find O Ok do you have the date

of the request? U Well that was yesterday O...ok I think I can find it..I

got it O It’s for a password reset. U Right. The problem is

that I changed the password when I first logged in..

O You were supposed to change first time you logged in. Now let’s try together to log in

O can you tell me you RVS of your computer

U Well let me see. This is a new PC to me. Where can I find it?

O Usually the tag is right next to the base of the chassy next to the power switch. It reads “inventario settore informatico”.

U Inventario..Settore... Informatico. Got it 123456 O yes that is right. Now, you

see I’m writing the old login..now you type in the new login. It should be at least 6 characters...

U Ok let me write that down one moment

..................................

Personal Identification

Problem Statement

Problem Resolution

Problem Resolution (PART I)

Problem Resolution (PART II)

Page 8: Annotation of Italian Dialogues following the Penn Discourse …pdtb2012/assets/... · 2012. 5. 24. · “This is not active either” 1 “So we can contact you at” 2 “BUT I’m

8 Sara Tonelli

Argument Selection Strategy

Similar to PDTB: •  As in PDTB, select Arg1 and Arg2 following a minimality

principle

Page 9: Annotation of Italian Dialogues following the Penn Discourse …pdtb2012/assets/... · 2012. 5. 24. · “This is not active either” 1 “So we can contact you at” 2 “BUT I’m

9 Sara Tonelli

Argument Selection Strategy

Similar to PDTB: •  As in PDTB, select Arg1 and Arg2 following a minimality

principle •  Annotate both implicit and explicit relations.

Page 10: Annotation of Italian Dialogues following the Penn Discourse …pdtb2012/assets/... · 2012. 5. 24. · “This is not active either” 1 “So we can contact you at” 2 “BUT I’m

10 Sara Tonelli

Argument Selection Strategy

Similar to PDTB: •  As in PDTB, select Arg1 and Arg2 following a minimality

principle •  Annotate both implicit and explicit relations.

Different from PDTB: •  Implicit relations are not annotated only between adjacent

sentences

recorded in a separate file and were not visible in the rawdialogs. This means that overlapping turns were just dis-played in sequence and the annotator had to reconstruct theturn span following his intuition and the content of text seg-ments.

4.1. Argument selection strategyThe basic intuition for argument selection remained thesame as in the PDTB: for each discourse relation, we identi-fied in the LUNA corpus two arguments, Arg1 and Arg2,assuming that each relation can have two and only twoarguments. Text span selection followed the “minimalityprinciple”, i.e. only the text string minimally necessary tointerpret the relation was selected for each argument.A relevant adjustment we had to introduce in the argumentselection was that we could not limit annotation of implicitrelations to adjacent sentences or turns as in the PDTB, be-cause discourse in dialogs is much more fragmented than inprose and there are a lot of interruptions, disfluencies, etc.Keeping the adjacency criterion would have implied miss-ing a lot of implicit relations, so we just suggested that allimplicit relations should be identified in the text. An ex-ample of an implicit relation between two non-contiguousarguments is reported in (d). We mark with index 1 and 2the utterances expressed respectively by Speaker 1 and 2.(d) Implicit: “Anche questo non e attivo”1 “quindi possi-

amo contattarla al”2 “PERO sto aspettando che mel’attivino”1“This is not active either”1 “So we can contact youat”2 “BUT I’m waiting for it to be activated”1

The two arguments are part of the same turn, even if theyare not adjacent, while the sentence “So we can contact youat” is clearly an interruption inserted in the dialogs by adifferent speaker.

4.2. Relation typesAs in the PDTB, we annotated in the LUNA corpus fourrelation types: Explicit discourse connectives, Implicit re-lations, AltLex and EntRel. Besides, we introduced the In-terruption label for the cases in which the speaker has beeninterrupted while uttering a sentence and therefore he couldjust express one complete argument.Explicit connectives are considered to build a closed class,drawn from three grammatical classes: i) subordinatingconjunctions: ii) coordinating conjunctions iii) ADVP andPP adverbialsNot all tokens of words and phrases that can serve as Ex-plicit connectives actually do so. In some cases, whichare very frequent in spontaneous speech, they do not de-note relations between two abstract objects, thus they havenot been annotated as discourse connectives. In particu-lar, there is a group of words including adverbials and con-nectives that are commonly defined as discourse markers(Schiffrin, 1987) or phatic connectives (Bazzanella, 1990)such as “cioe” (well), “allora” (so), etc. These words havenot been annotated when they are used to signal the orga-nizational or focus structure of the discourse and underlinethe interactive structure of the conversation, rather than re-late AOs. Note that most of such words appear in the di-alogs also as proper connectives. For a comparison between

the use of “allora” (so) as discourse marker and as connec-tive, see the examples below. In example (e), “Allora“ isclearly a turn-taking device, while in example (f) it con-nects two turns and introduces a causal inference drawn bySpeaker 2.

(e) “Allora vediamo un po’ ecco qua”1“So let’s see here it is”1

(f) “In questo momento il palazzo non e collegato”1“Allora e meglio collegarlo”2“In this moment the building is not connected”1 “Sowe’d better connect it”2

There are also other cases in which some words and phrasesthat can serve as explicit connectives serve other functions,such as to relate non-AO entities, and are not annotated asdiscourse connectives. This is the case for example of “e”(and) conjoining two noun phrases, or “quindi” (so/namely)modifying a noun phrase.As for implicit connectives, the identification and annota-tion of a discourse relation is the same as in the PDTB: inorder to capture relations between abstract objects that arenot explicitly realized in the text, annotators have to firstidentify the arguments involved in the relations and then in-sert a connective expression that best expresses the inferredrelations. Insertable connectives are drawn primarily fromthe set of explicit connectives, but annotators are free to se-lect alternative expressions as well. Also, combinations ofconnectives are allowed. An example of an implicit relationis reported in example (d), with the connective “pero” (but)manually specified by the annotator.As for alternative lexicalizations or AltLex, several exam-ples are present in the LUNA corpus. One of them is re-ported below:

(g) “Forse lei prima tentava di accedere con le inizialidel nome e del cognome”1 “Ecco perche non riusci-vamo”2“Maybe you were trying to login using the initialsof your name and surname”1 “That’s why we couldn’t”2

Example (g) is a typical case of an alternative lexicalizationbecause the relation between the arguments is conveyed bya non-connective expression (“Ecco perche”) having twoparts, one referring to the relation (“perche”) and the otherreferring anaphorically to the previous argument (“Ecco”).Similar cases are “Per questo motivo” (For this reason),“Nonostante questo” (Despite this), “Dopo questo evento”(After this event), and so on. As shown in example (g),we cannot classify such relations as implicit, because theinsertion of a connective between the arguments would beredundant.In order to make the causative relation more explicit, wecould reformulate the two turns in (g) as:“We couldn’t access”2 “because maybe you were trying tologin using the initials of your name and surname”1.

4.3. Sense HierarchyAs in the PDTB, every discourse relation found in theLUNA corpus was classified with a sense label describ-ing the semantics of the relation. Also, the LUNA senses

Page 11: Annotation of Italian Dialogues following the Penn Discourse …pdtb2012/assets/... · 2012. 5. 24. · “This is not active either” 1 “So we can contact you at” 2 “BUT I’m

11 Sara Tonelli

Relation Selection Strategy

Similar to PDTB: •  Annotate four relations types: Explicit, Implicit, AltLex and

EntRel

Page 12: Annotation of Italian Dialogues following the Penn Discourse …pdtb2012/assets/... · 2012. 5. 24. · “This is not active either” 1 “So we can contact you at” 2 “BUT I’m

12 Sara Tonelli

Relation Selection Strategy

Similar to PDTB: •  Annotate four relations types: Explicit, Implicit, AltLex and

EntRel Different from PDTB: •  Interruption label in case the speaker can utter only one

argument but not the other

Page 13: Annotation of Italian Dialogues following the Penn Discourse …pdtb2012/assets/... · 2012. 5. 24. · “This is not active either” 1 “So we can contact you at” 2 “BUT I’m

13 Sara Tonelli

Relation Selection Strategy

Similar to PDTB: •  Annotate four relations types: Explicit, Implicit, AltLex and

EntRel Different from PDTB: •  Interruption label in case the speaker can utter only one

argument but not the other

Note: High number of discourse markers used for turn taking. They signal the organizational focus of the discourse but are not connectives in PDTB sense (and are not annotated)

Page 14: Annotation of Italian Dialogues following the Penn Discourse …pdtb2012/assets/... · 2012. 5. 24. · “This is not active either” 1 “So we can contact you at” 2 “BUT I’m

14 Sara Tonelli

Relation Selection Strategy

Similar to PDTB: •  Annotate four relations types: Explicit, Implicit, AltLex and

EntRel Different from PDTB: •  Interruption label in case the speaker can utter only one

argument but not the other

recorded in a separate file and were not visible in the rawdialogs. This means that overlapping turns were just dis-played in sequence and the annotator had to reconstruct theturn span following his intuition and the content of text seg-ments.

4.1. Argument selection strategyThe basic intuition for argument selection remained thesame as in the PDTB: for each discourse relation, we identi-fied in the LUNA corpus two arguments, Arg1 and Arg2,assuming that each relation can have two and only twoarguments. Text span selection followed the “minimalityprinciple”, i.e. only the text string minimally necessary tointerpret the relation was selected for each argument.A relevant adjustment we had to introduce in the argumentselection was that we could not limit annotation of implicitrelations to adjacent sentences or turns as in the PDTB, be-cause discourse in dialogs is much more fragmented than inprose and there are a lot of interruptions, disfluencies, etc.Keeping the adjacency criterion would have implied miss-ing a lot of implicit relations, so we just suggested that allimplicit relations should be identified in the text. An ex-ample of an implicit relation between two non-contiguousarguments is reported in (d). We mark with index 1 and 2the utterances expressed respectively by Speaker 1 and 2.(d) Implicit: “Anche questo non e attivo”1 “quindi possi-

amo contattarla al”2 “PERO sto aspettando che mel’attivino”1“This is not active either”1 “So we can contact youat”2 “BUT I’m waiting for it to be activated”1

The two arguments are part of the same turn, even if theyare not adjacent, while the sentence “So we can contact youat” is clearly an interruption inserted in the dialogs by adifferent speaker.

4.2. Relation typesAs in the PDTB, we annotated in the LUNA corpus fourrelation types: Explicit discourse connectives, Implicit re-lations, AltLex and EntRel. Besides, we introduced the In-terruption label for the cases in which the speaker has beeninterrupted while uttering a sentence and therefore he couldjust express one complete argument.Explicit connectives are considered to build a closed class,drawn from three grammatical classes: i) subordinatingconjunctions: ii) coordinating conjunctions iii) ADVP andPP adverbialsNot all tokens of words and phrases that can serve as Ex-plicit connectives actually do so. In some cases, whichare very frequent in spontaneous speech, they do not de-note relations between two abstract objects, thus they havenot been annotated as discourse connectives. In particu-lar, there is a group of words including adverbials and con-nectives that are commonly defined as discourse markers(Schiffrin, 1987) or phatic connectives (Bazzanella, 1990)such as “cioe” (well), “allora” (so), etc. These words havenot been annotated when they are used to signal the orga-nizational or focus structure of the discourse and underlinethe interactive structure of the conversation, rather than re-late AOs. Note that most of such words appear in the di-alogs also as proper connectives. For a comparison between

the use of “allora” (so) as discourse marker and as connec-tive, see the examples below. In example (e), “Allora“ isclearly a turn-taking device, while in example (f) it con-nects two turns and introduces a causal inference drawn bySpeaker 2.

(e) “Allora vediamo un po’ ecco qua”1“So let’s see here it is”1

(f) “In questo momento il palazzo non e collegato”1“Allora e meglio collegarlo”2“In this moment the building is not connected”1 “Sowe’d better connect it”2

There are also other cases in which some words and phrasesthat can serve as explicit connectives serve other functions,such as to relate non-AO entities, and are not annotated asdiscourse connectives. This is the case for example of “e”(and) conjoining two noun phrases, or “quindi” (so/namely)modifying a noun phrase.As for implicit connectives, the identification and annota-tion of a discourse relation is the same as in the PDTB: inorder to capture relations between abstract objects that arenot explicitly realized in the text, annotators have to firstidentify the arguments involved in the relations and then in-sert a connective expression that best expresses the inferredrelations. Insertable connectives are drawn primarily fromthe set of explicit connectives, but annotators are free to se-lect alternative expressions as well. Also, combinations ofconnectives are allowed. An example of an implicit relationis reported in example (d), with the connective “pero” (but)manually specified by the annotator.As for alternative lexicalizations or AltLex, several exam-ples are present in the LUNA corpus. One of them is re-ported below:

(g) “Forse lei prima tentava di accedere con le inizialidel nome e del cognome”1 “Ecco perche non riusci-vamo”2“Maybe you were trying to login using the initialsof your name and surname”1 “That’s why we couldn’t”2

Example (g) is a typical case of an alternative lexicalizationbecause the relation between the arguments is conveyed bya non-connective expression (“Ecco perche”) having twoparts, one referring to the relation (“perche”) and the otherreferring anaphorically to the previous argument (“Ecco”).Similar cases are “Per questo motivo” (For this reason),“Nonostante questo” (Despite this), “Dopo questo evento”(After this event), and so on. As shown in example (g),we cannot classify such relations as implicit, because theinsertion of a connective between the arguments would beredundant.In order to make the causative relation more explicit, wecould reformulate the two turns in (g) as:“We couldn’t access”2 “because maybe you were trying tologin using the initials of your name and surname”1.

4.3. Sense HierarchyAs in the PDTB, every discourse relation found in theLUNA corpus was classified with a sense label describ-ing the semantics of the relation. Also, the LUNA senses

Note: High number of discourse markers used for turn taking. They signal the organizational focus of the discourse but are not connectives in PDTB sense (and are not annotated)

Page 15: Annotation of Italian Dialogues following the Penn Discourse …pdtb2012/assets/... · 2012. 5. 24. · “This is not active either” 1 “So we can contact you at” 2 “BUT I’m

15 Sara Tonelli

Sense hierarchy

Similar to PDTB: •  Assign sense labels following a three-layered sense hierarchy •  Top-level classes: Temporal, Contingency, Comparison and

Expansion

Page 16: Annotation of Italian Dialogues following the Penn Discourse …pdtb2012/assets/... · 2012. 5. 24. · “This is not active either” 1 “So we can contact you at” 2 “BUT I’m

16 Sara Tonelli

Sense hierarchy

Similar to PDTB: •  Assign sense labels following a three-layered sense hierarchy •  Top-level classes: Temporal, Contingency, Comparison and

Expansion

Different from PDTB: •  Second-level connective types:

–  Added Goal to the Contingency Class (with Cause and Condition) –  In the Expansion class, List type was discarded, no examples found

Page 17: Annotation of Italian Dialogues following the Penn Discourse …pdtb2012/assets/... · 2012. 5. 24. · “This is not active either” 1 “So we can contact you at” 2 “BUT I’m

17 Sara Tonelli

Sense hierarchy

Similar to PDTB: •  Assign sense labels following a three-layered sense hierarchy •  Top-level classes: Temporal, Contingency, Comparison and

Expansion

Different from PDTB: •  Second-level connective types:

–  Added Goal to the Contingency Class (with Cause and Condition) –  In the Expansion class, List type was discarded, no examples found

•  Third-level connective subtypes: –  Modifications partly inspired by the classification proposed in the

Hindi Discourse Relation Bank (Oza et al, 2009)

Page 18: Annotation of Italian Dialogues following the Penn Discourse …pdtb2012/assets/... · 2012. 5. 24. · “This is not active either” 1 “So we can contact you at” 2 “BUT I’m

18 Sara Tonelli

Sense hierarchy

Different from PDTB: •  Third-level connective subtypes:

–  Pragmatic dimension more relevant in dialogues than in prose. –  PDTB distinction at type level between semantic and pragmatic

relations was modified so as to address speaker’s intentions and inferences

Page 19: Annotation of Italian Dialogues following the Penn Discourse …pdtb2012/assets/... · 2012. 5. 24. · “This is not active either” 1 “So we can contact you at” 2 “BUT I’m

19 Sara Tonelli

Sense hierarchy

Different from PDTB: •  Third-level connective subtypes:

–  Pragmatic dimension more relevant in dialogues than in prose. –  PDTB distinction at type level between semantic and pragmatic

relations was modified so as to address speaker’s intentions and inferences

–  At sub-type level, we distinguish between semantic, epistemic and speech-act type, e.g.

Page 20: Annotation of Italian Dialogues following the Penn Discourse …pdtb2012/assets/... · 2012. 5. 24. · “This is not active either” 1 “So we can contact you at” 2 “BUT I’m

20 Sara Tonelli

Sense hierarchy

Different from PDTB: •  Third-level connective subtypes:

–  Pragmatic dimension more relevant in dialogues than in prose. –  PDTB distinction at type level between semantic and pragmatic

relations was modified so as to address speaker’s intentions and inferences

–  At sub-type level, we distinguish between semantic, epistemic and speech-act type, e.g.

Page 21: Annotation of Italian Dialogues following the Penn Discourse …pdtb2012/assets/... · 2012. 5. 24. · “This is not active either” 1 “So we can contact you at” 2 “BUT I’m

21 Sara Tonelli

An example of Epistemic and Speech-act type

to the content or semantic domain (Sweetser, 1990).Furthermore, in contrast to the PDTB, where the pragmaticsenses are specified at the type level, we introduce them atthe subtype level, distinguishing them from the semanticsenses, as shown in Fig. 1. Whenever the pragmatic sensesare available for the relation, the corresponding type levelsense is distinguished at the subtype level into its semanticand pragmatic senses. In general, we admit two kinds ofpragmatic senses, i.e. epistemic and speech-act. Only forConcessions, we introduced two more subtypes, i.e. theproper pragmatic and the propositional one, because theavailable ones could not capture all examples of concessionfound in the corpus3.The epistemic label is assigned when the speaker’s opin-ion, belief, interpretation is involved in the relation, whilethe speech-act subtype applies when the relation concernsthe speech-act level and not strictly the meaning of the ar-gument(s) (Berretta, 1984). Two examples of epistemic andspeech-act type of causal relation are reported resp. in (l)and (m):

(l) “Ho il PC che presumibilmente non funziona da”1 “sı”2“stamattina perche ho acceso da un segnale sul videotipo televisore senza antenna”1.“My PC hasn’t presumably been working since”1 “yes”2“this morning because I switched it on it shows a signal onthe video like a TV without antenna”1.

(m) “Avrei bisogno di sapere qualcosa al riguardo dellarichiesta numero centosessantaquattro diciassetteperche avevate mandato la mail”1.“I would like to know something about my requestnumber one hundred sixty-four seventeen because you hadsent me an e-mail”1.

In (l), the fact expressed in Arg2 (in bold) causes the factthat the speaker believes the content of Arg1 (in italics).In other words, we classify this relationship as epistemicbecause Arg1 expresses a speaker’s belief or conclusionthat is based on an observation or justification displayed inArg2.In (m), Arg2 explains why the speaker is asking the indi-rect question in Arg1. We can say that the causal relationdoes not involve the semantics of the two events describedin the two arguments but rather the speech-act level ofArg1 and the reason motivating the speech-act (expressedin Arg2).While we introduced new labels at subtype level, we elim-inated some other subtype labels of the PTBD, many ofwhich were just expressing a variation in the order of thearguments. For example, in the PDTB the Cause type isdivided into the reason and result subtypes. In the formercase, the situation described in Arg2 is the cause of the sit-uation in Arg1, while it is the contrary for the result sub-type. In all cases, the naming convention for Arg1 andArg2 is syntactically driven, in that Arg2 always corre-sponds to the argument with which the connective was syn-tactically associated while the other argument is expressedin Arg1.

3The distinctions introduced for concessions are still underdiscussion and will probably undergo a further revision.

In the LUNA corpus, instead, the argument identificationis semantically driven, i.e. every argument bears a sense-specific semantic role regardless of its position in the re-lation. In this way, we could merge the reason and resultsubtypes under the cause type, assigning the Arg2 label tothe situation that causes the event expressed in Arg1. Ac-cording to this classification, both examples in (n) and (m)report a relation classified as (semantic) cause. Arg1 (initalics) precedes Arg2 (in bold) in the first example, whilethe order is inverted in the second example.

(n) “Hanno di nuovo chiamato perche c’erano ancora deiproblemi”.“They called again because there were still problems”.

(o) “La fotocopiatrice si inceppa sempre quindi abbiamodovuto togliere i fogli.”“The photocopier always jams so we had to extract thepaper”.

In dialogs, a clause, a sentence or a turn is often the exactrepetition of a previous utterance or part of it due to the in-teractive nature of spontaneous conversations. We decidedto annotate such cases introducing the Repetition label be-cause repetitions in LUNAwere very frequently used by thespeakers as a device to connect different turns. We considerRepetitions as a particular kind of implicit relations, whichhowever do not require any connective to be specified.We report an example of Repetition in (q), where Speaker2 repeats part of the utterance by Speaker 1:

(q) “Allora ho tolto le eccezioni funziona”1 “hai tolto”2“riprova”1 “le eccezioni”2“So I disabled the exceptions it works”1 “You disabled”2“Try it again”1 “the exceptions”2

The above example shows also that it is not always easy tounderstand who’s speaking and to identify the relations be-tween utterances in a dialog. In this case, Arg2 (in bold) isdiscontinuous because “Try it again” is overlapping part ofit. But also “You disabled” was uttered to interrupt the firstturn. The example also shows that arguments, for instanceArg1 (in italics), do not necessarily coincide with turns,but rather that they mostly include part of them.

5. Corpus analysisWe report in Table 1 some statistics over the Human-humandialogs annotated so far. For the sake of simplification, Im-plicit relations also include Repetitions.In this corpus, the number of annotated relations is less thanhalf of the number of turns, while in the PDTB only 0.6%of all sentences does not show any relation to other sen-tences in the text. The LUNA corpus, indeed, contains a lotof disfluencies and semantically empty turns, for examplediscourse markers, which do not belong to any discourse re-lation (see Section 4.2.). Besides, a single argument oftenincludes two or more turns when it is expressed discontin-uously.As for the different relation types, the percentage of Ex-plicit relations in the LUNA corpus is much higher thanin the PDTB (65.5% vs. 45.75%), while all other types

Page 22: Annotation of Italian Dialogues following the Penn Discourse …pdtb2012/assets/... · 2012. 5. 24. · “This is not active either” 1 “So we can contact you at” 2 “BUT I’m

22 Sara Tonelli

An example of Epistemic and Speech-act type

to the content or semantic domain (Sweetser, 1990).Furthermore, in contrast to the PDTB, where the pragmaticsenses are specified at the type level, we introduce them atthe subtype level, distinguishing them from the semanticsenses, as shown in Fig. 1. Whenever the pragmatic sensesare available for the relation, the corresponding type levelsense is distinguished at the subtype level into its semanticand pragmatic senses. In general, we admit two kinds ofpragmatic senses, i.e. epistemic and speech-act. Only forConcessions, we introduced two more subtypes, i.e. theproper pragmatic and the propositional one, because theavailable ones could not capture all examples of concessionfound in the corpus3.The epistemic label is assigned when the speaker’s opin-ion, belief, interpretation is involved in the relation, whilethe speech-act subtype applies when the relation concernsthe speech-act level and not strictly the meaning of the ar-gument(s) (Berretta, 1984). Two examples of epistemic andspeech-act type of causal relation are reported resp. in (l)and (m):

(l) “Ho il PC che presumibilmente non funziona da”1 “sı”2“stamattina perche ho acceso da un segnale sul videotipo televisore senza antenna”1.“My PC hasn’t presumably been working since”1 “yes”2“this morning because I switched it on it shows a signal onthe video like a TV without antenna”1.

(m) “Avrei bisogno di sapere qualcosa al riguardo dellarichiesta numero centosessantaquattro diciassetteperche avevate mandato la mail”1.“I would like to know something about my requestnumber one hundred sixty-four seventeen because you hadsent me an e-mail”1.

In (l), the fact expressed in Arg2 (in bold) causes the factthat the speaker believes the content of Arg1 (in italics).In other words, we classify this relationship as epistemicbecause Arg1 expresses a speaker’s belief or conclusionthat is based on an observation or justification displayed inArg2.In (m), Arg2 explains why the speaker is asking the indi-rect question in Arg1. We can say that the causal relationdoes not involve the semantics of the two events describedin the two arguments but rather the speech-act level ofArg1 and the reason motivating the speech-act (expressedin Arg2).While we introduced new labels at subtype level, we elim-inated some other subtype labels of the PTBD, many ofwhich were just expressing a variation in the order of thearguments. For example, in the PDTB the Cause type isdivided into the reason and result subtypes. In the formercase, the situation described in Arg2 is the cause of the sit-uation in Arg1, while it is the contrary for the result sub-type. In all cases, the naming convention for Arg1 andArg2 is syntactically driven, in that Arg2 always corre-sponds to the argument with which the connective was syn-tactically associated while the other argument is expressedin Arg1.

3The distinctions introduced for concessions are still underdiscussion and will probably undergo a further revision.

In the LUNA corpus, instead, the argument identificationis semantically driven, i.e. every argument bears a sense-specific semantic role regardless of its position in the re-lation. In this way, we could merge the reason and resultsubtypes under the cause type, assigning the Arg2 label tothe situation that causes the event expressed in Arg1. Ac-cording to this classification, both examples in (n) and (m)report a relation classified as (semantic) cause. Arg1 (initalics) precedes Arg2 (in bold) in the first example, whilethe order is inverted in the second example.

(n) “Hanno di nuovo chiamato perche c’erano ancora deiproblemi”.“They called again because there were still problems”.

(o) “La fotocopiatrice si inceppa sempre quindi abbiamodovuto togliere i fogli.”“The photocopier always jams so we had to extract thepaper”.

In dialogs, a clause, a sentence or a turn is often the exactrepetition of a previous utterance or part of it due to the in-teractive nature of spontaneous conversations. We decidedto annotate such cases introducing the Repetition label be-cause repetitions in LUNAwere very frequently used by thespeakers as a device to connect different turns. We considerRepetitions as a particular kind of implicit relations, whichhowever do not require any connective to be specified.We report an example of Repetition in (q), where Speaker2 repeats part of the utterance by Speaker 1:

(q) “Allora ho tolto le eccezioni funziona”1 “hai tolto”2“riprova”1 “le eccezioni”2“So I disabled the exceptions it works”1 “You disabled”2“Try it again”1 “the exceptions”2

The above example shows also that it is not always easy tounderstand who’s speaking and to identify the relations be-tween utterances in a dialog. In this case, Arg2 (in bold) isdiscontinuous because “Try it again” is overlapping part ofit. But also “You disabled” was uttered to interrupt the firstturn. The example also shows that arguments, for instanceArg1 (in italics), do not necessarily coincide with turns,but rather that they mostly include part of them.

5. Corpus analysisWe report in Table 1 some statistics over the Human-humandialogs annotated so far. For the sake of simplification, Im-plicit relations also include Repetitions.In this corpus, the number of annotated relations is less thanhalf of the number of turns, while in the PDTB only 0.6%of all sentences does not show any relation to other sen-tences in the text. The LUNA corpus, indeed, contains a lotof disfluencies and semantically empty turns, for examplediscourse markers, which do not belong to any discourse re-lation (see Section 4.2.). Besides, a single argument oftenincludes two or more turns when it is expressed discontin-uously.As for the different relation types, the percentage of Ex-plicit relations in the LUNA corpus is much higher thanin the PDTB (65.5% vs. 45.75%), while all other types

CAUSAL, EPISTEMIC: The fact expressed in Arg2 (in bold) causes the fact that the speaker believes the content of Arg1

Page 23: Annotation of Italian Dialogues following the Penn Discourse …pdtb2012/assets/... · 2012. 5. 24. · “This is not active either” 1 “So we can contact you at” 2 “BUT I’m

23 Sara Tonelli

An example of Epistemic and Speech-act type

to the content or semantic domain (Sweetser, 1990).Furthermore, in contrast to the PDTB, where the pragmaticsenses are specified at the type level, we introduce them atthe subtype level, distinguishing them from the semanticsenses, as shown in Fig. 1. Whenever the pragmatic sensesare available for the relation, the corresponding type levelsense is distinguished at the subtype level into its semanticand pragmatic senses. In general, we admit two kinds ofpragmatic senses, i.e. epistemic and speech-act. Only forConcessions, we introduced two more subtypes, i.e. theproper pragmatic and the propositional one, because theavailable ones could not capture all examples of concessionfound in the corpus3.The epistemic label is assigned when the speaker’s opin-ion, belief, interpretation is involved in the relation, whilethe speech-act subtype applies when the relation concernsthe speech-act level and not strictly the meaning of the ar-gument(s) (Berretta, 1984). Two examples of epistemic andspeech-act type of causal relation are reported resp. in (l)and (m):

(l) “Ho il PC che presumibilmente non funziona da”1 “sı”2“stamattina perche ho acceso da un segnale sul videotipo televisore senza antenna”1.“My PC hasn’t presumably been working since”1 “yes”2“this morning because I switched it on it shows a signal onthe video like a TV without antenna”1.

(m) “Avrei bisogno di sapere qualcosa al riguardo dellarichiesta numero centosessantaquattro diciassetteperche avevate mandato la mail”1.“I would like to know something about my requestnumber one hundred sixty-four seventeen because you hadsent me an e-mail”1.

In (l), the fact expressed in Arg2 (in bold) causes the factthat the speaker believes the content of Arg1 (in italics).In other words, we classify this relationship as epistemicbecause Arg1 expresses a speaker’s belief or conclusionthat is based on an observation or justification displayed inArg2.In (m), Arg2 explains why the speaker is asking the indi-rect question in Arg1. We can say that the causal relationdoes not involve the semantics of the two events describedin the two arguments but rather the speech-act level ofArg1 and the reason motivating the speech-act (expressedin Arg2).While we introduced new labels at subtype level, we elim-inated some other subtype labels of the PTBD, many ofwhich were just expressing a variation in the order of thearguments. For example, in the PDTB the Cause type isdivided into the reason and result subtypes. In the formercase, the situation described in Arg2 is the cause of the sit-uation in Arg1, while it is the contrary for the result sub-type. In all cases, the naming convention for Arg1 andArg2 is syntactically driven, in that Arg2 always corre-sponds to the argument with which the connective was syn-tactically associated while the other argument is expressedin Arg1.

3The distinctions introduced for concessions are still underdiscussion and will probably undergo a further revision.

In the LUNA corpus, instead, the argument identificationis semantically driven, i.e. every argument bears a sense-specific semantic role regardless of its position in the re-lation. In this way, we could merge the reason and resultsubtypes under the cause type, assigning the Arg2 label tothe situation that causes the event expressed in Arg1. Ac-cording to this classification, both examples in (n) and (m)report a relation classified as (semantic) cause. Arg1 (initalics) precedes Arg2 (in bold) in the first example, whilethe order is inverted in the second example.

(n) “Hanno di nuovo chiamato perche c’erano ancora deiproblemi”.“They called again because there were still problems”.

(o) “La fotocopiatrice si inceppa sempre quindi abbiamodovuto togliere i fogli.”“The photocopier always jams so we had to extract thepaper”.

In dialogs, a clause, a sentence or a turn is often the exactrepetition of a previous utterance or part of it due to the in-teractive nature of spontaneous conversations. We decidedto annotate such cases introducing the Repetition label be-cause repetitions in LUNAwere very frequently used by thespeakers as a device to connect different turns. We considerRepetitions as a particular kind of implicit relations, whichhowever do not require any connective to be specified.We report an example of Repetition in (q), where Speaker2 repeats part of the utterance by Speaker 1:

(q) “Allora ho tolto le eccezioni funziona”1 “hai tolto”2“riprova”1 “le eccezioni”2“So I disabled the exceptions it works”1 “You disabled”2“Try it again”1 “the exceptions”2

The above example shows also that it is not always easy tounderstand who’s speaking and to identify the relations be-tween utterances in a dialog. In this case, Arg2 (in bold) isdiscontinuous because “Try it again” is overlapping part ofit. But also “You disabled” was uttered to interrupt the firstturn. The example also shows that arguments, for instanceArg1 (in italics), do not necessarily coincide with turns,but rather that they mostly include part of them.

5. Corpus analysisWe report in Table 1 some statistics over the Human-humandialogs annotated so far. For the sake of simplification, Im-plicit relations also include Repetitions.In this corpus, the number of annotated relations is less thanhalf of the number of turns, while in the PDTB only 0.6%of all sentences does not show any relation to other sen-tences in the text. The LUNA corpus, indeed, contains a lotof disfluencies and semantically empty turns, for examplediscourse markers, which do not belong to any discourse re-lation (see Section 4.2.). Besides, a single argument oftenincludes two or more turns when it is expressed discontin-uously.As for the different relation types, the percentage of Ex-plicit relations in the LUNA corpus is much higher thanin the PDTB (65.5% vs. 45.75%), while all other types

CAUSAL, SPEECH-ACT: Arg2 (in bold) expresses the reason why the speaker makes a request in Arg1

Page 24: Annotation of Italian Dialogues following the Penn Discourse …pdtb2012/assets/... · 2012. 5. 24. · “This is not active either” 1 “So we can contact you at” 2 “BUT I’m

24 Sara Tonelli

The Problem of Repetitions

•  In dialogues, clauses, sentences or turns are often the exact

repetition of a previous utterance or part of it •  Repetitions are usually used by a speaker to connect different

turns

Page 25: Annotation of Italian Dialogues following the Penn Discourse …pdtb2012/assets/... · 2012. 5. 24. · “This is not active either” 1 “So we can contact you at” 2 “BUT I’m

25 Sara Tonelli

The Problem of Repetitions

•  In dialogues, clauses, sentences or turns are often the exact

repetition of a previous utterance or part of it •  Repetitions are usually used by a speaker to connect different

turns •  Are repetitions some special kind of discourse relations?

Page 26: Annotation of Italian Dialogues following the Penn Discourse …pdtb2012/assets/... · 2012. 5. 24. · “This is not active either” 1 “So we can contact you at” 2 “BUT I’m

26 Sara Tonelli

The Problem of Repetitions

•  In dialogues, clauses, sentences or turns are often the exact

repetition of a previous utterance or part of it •  Repetitions are usually used by a speaker to connect different

turns •  Are repetitions some special kind of discourse relations? •  We annotated them as an implicit relation, but without

specifying a possible connective

to the content or semantic domain (Sweetser, 1990).Furthermore, in contrast to the PDTB, where the pragmaticsenses are specified at the type level, we introduce them atthe subtype level, distinguishing them from the semanticsenses, as shown in Fig. 1. Whenever the pragmatic sensesare available for the relation, the corresponding type levelsense is distinguished at the subtype level into its semanticand pragmatic senses. In general, we admit two kinds ofpragmatic senses, i.e. epistemic and speech-act. Only forConcessions, we introduced two more subtypes, i.e. theproper pragmatic and the propositional one, because theavailable ones could not capture all examples of concessionfound in the corpus3.The epistemic label is assigned when the speaker’s opin-ion, belief, interpretation is involved in the relation, whilethe speech-act subtype applies when the relation concernsthe speech-act level and not strictly the meaning of the ar-gument(s) (Berretta, 1984). Two examples of epistemic andspeech-act type of causal relation are reported resp. in (l)and (m):

(l) “Ho il PC che presumibilmente non funziona da”1 “sı”2“stamattina perche ho acceso da un segnale sul videotipo televisore senza antenna”1.“My PC hasn’t presumably been working since”1 “yes”2“this morning because I switched it on it shows a signal onthe video like a TV without antenna”1.

(m) “Avrei bisogno di sapere qualcosa al riguardo dellarichiesta numero centosessantaquattro diciassetteperche avevate mandato la mail”1.“I would like to know something about my requestnumber one hundred sixty-four seventeen because you hadsent me an e-mail”1.

In (l), the fact expressed in Arg2 (in bold) causes the factthat the speaker believes the content of Arg1 (in italics).In other words, we classify this relationship as epistemicbecause Arg1 expresses a speaker’s belief or conclusionthat is based on an observation or justification displayed inArg2.In (m), Arg2 explains why the speaker is asking the indi-rect question in Arg1. We can say that the causal relationdoes not involve the semantics of the two events describedin the two arguments but rather the speech-act level ofArg1 and the reason motivating the speech-act (expressedin Arg2).While we introduced new labels at subtype level, we elim-inated some other subtype labels of the PTBD, many ofwhich were just expressing a variation in the order of thearguments. For example, in the PDTB the Cause type isdivided into the reason and result subtypes. In the formercase, the situation described in Arg2 is the cause of the sit-uation in Arg1, while it is the contrary for the result sub-type. In all cases, the naming convention for Arg1 andArg2 is syntactically driven, in that Arg2 always corre-sponds to the argument with which the connective was syn-tactically associated while the other argument is expressedin Arg1.

3The distinctions introduced for concessions are still underdiscussion and will probably undergo a further revision.

In the LUNA corpus, instead, the argument identificationis semantically driven, i.e. every argument bears a sense-specific semantic role regardless of its position in the re-lation. In this way, we could merge the reason and resultsubtypes under the cause type, assigning the Arg2 label tothe situation that causes the event expressed in Arg1. Ac-cording to this classification, both examples in (n) and (m)report a relation classified as (semantic) cause. Arg1 (initalics) precedes Arg2 (in bold) in the first example, whilethe order is inverted in the second example.

(n) “Hanno di nuovo chiamato perche c’erano ancora deiproblemi”.“They called again because there were still problems”.

(o) “La fotocopiatrice si inceppa sempre quindi abbiamodovuto togliere i fogli.”“The photocopier always jams so we had to extract thepaper”.

In dialogs, a clause, a sentence or a turn is often the exactrepetition of a previous utterance or part of it due to the in-teractive nature of spontaneous conversations. We decidedto annotate such cases introducing the Repetition label be-cause repetitions in LUNAwere very frequently used by thespeakers as a device to connect different turns. We considerRepetitions as a particular kind of implicit relations, whichhowever do not require any connective to be specified.We report an example of Repetition in (q), where Speaker2 repeats part of the utterance by Speaker 1:

(q) “Allora ho tolto le eccezioni funziona”1 “hai tolto”2“riprova”1 “le eccezioni”2“So I disabled the exceptions it works”1 “You disabled”2“Try it again”1 “the exceptions”2

The above example shows also that it is not always easy tounderstand who’s speaking and to identify the relations be-tween utterances in a dialog. In this case, Arg2 (in bold) isdiscontinuous because “Try it again” is overlapping part ofit. But also “You disabled” was uttered to interrupt the firstturn. The example also shows that arguments, for instanceArg1 (in italics), do not necessarily coincide with turns,but rather that they mostly include part of them.

5. Corpus analysisWe report in Table 1 some statistics over the Human-humandialogs annotated so far. For the sake of simplification, Im-plicit relations also include Repetitions.In this corpus, the number of annotated relations is less thanhalf of the number of turns, while in the PDTB only 0.6%of all sentences does not show any relation to other sen-tences in the text. The LUNA corpus, indeed, contains a lotof disfluencies and semantically empty turns, for examplediscourse markers, which do not belong to any discourse re-lation (see Section 4.2.). Besides, a single argument oftenincludes two or more turns when it is expressed discontin-uously.As for the different relation types, the percentage of Ex-plicit relations in the LUNA corpus is much higher thanin the PDTB (65.5% vs. 45.75%), while all other types

Page 27: Annotation of Italian Dialogues following the Penn Discourse …pdtb2012/assets/... · 2012. 5. 24. · “This is not active either” 1 “So we can contact you at” 2 “BUT I’m

27 Sara Tonelli

What was annotated?

Page 28: Annotation of Italian Dialogues following the Penn Discourse …pdtb2012/assets/... · 2012. 5. 24. · “This is not active either” 1 “So we can contact you at” 2 “BUT I’m

28 Sara Tonelli

Most frequent Connectives

The connectives are considered independently from the sense label

Page 29: Annotation of Italian Dialogues following the Penn Discourse …pdtb2012/assets/... · 2012. 5. 24. · “This is not active either” 1 “So we can contact you at” 2 “BUT I’m

29 Sara Tonelli

Most frequent Connectives

Explicit relations in LUNA are 65.6% vs. 45.75% in PDTB

The connectives are considered independently from the sense label

Page 30: Annotation of Italian Dialogues following the Penn Discourse …pdtb2012/assets/... · 2012. 5. 24. · “This is not active either” 1 “So we can contact you at” 2 “BUT I’m

30 Sara Tonelli

Most frequent Connectives

Explicit relations in LUNA are 65.6% vs. 45.75% in PDTB

The connectives are considered independently from the sense label

In PDTB: Most frequent Explicit: BUT, AND, BECAUSE Most frequent Implicit: BECAUSE, AND, SPECIFICALLY

Page 31: Annotation of Italian Dialogues following the Penn Discourse …pdtb2012/assets/... · 2012. 5. 24. · “This is not active either” 1 “So we can contact you at” 2 “BUT I’m

31 Sara Tonelli

Most frequent Connectives

Explicit relations in LUNA are 65.6% vs. 45.75% in PDTB

The connectives are considered independently from the sense label

In PDTB: Most frequent Explicit: BUT, AND, BECAUSE Most frequent Implicit: BECAUSE, AND, SPECIFICALLY In LUNA: 85 Explicit and 23 Implicit connectives (unique) In PDTB: 100 Explicit and 102 Implicit connectives (unique)

Page 32: Annotation of Italian Dialogues following the Penn Discourse …pdtb2012/assets/... · 2012. 5. 24. · “This is not active either” 1 “So we can contact you at” 2 “BUT I’m

32 Sara Tonelli

Most frequent Senses

Page 33: Annotation of Italian Dialogues following the Penn Discourse …pdtb2012/assets/... · 2012. 5. 24. · “This is not active either” 1 “So we can contact you at” 2 “BUT I’m

33 Sara Tonelli

Most frequent Senses

In PDTB Explicit: Semantic conjunction, Semantic contrast Reason (subtype of Semantic Cause)

Page 34: Annotation of Italian Dialogues following the Penn Discourse …pdtb2012/assets/... · 2012. 5. 24. · “This is not active either” 1 “So we can contact you at” 2 “BUT I’m

34 Sara Tonelli

Most frequent Senses

In PDTB Explicit: Semantic conjunction, Semantic contrast Reason (subtype of Semantic Cause) Implicit: Semantic conjunction Specification Reason

Page 35: Annotation of Italian Dialogues following the Penn Discourse …pdtb2012/assets/... · 2012. 5. 24. · “This is not active either” 1 “So we can contact you at” 2 “BUT I’m

35 Sara Tonelli

Most frequent Senses

In PDTB Explicit: Semantic conjunction, Semantic contrast Reason (subtype of Semantic Cause) Implicit: Semantic conjunction Specification Reason

Does this difference depend on the domain?

Page 36: Annotation of Italian Dialogues following the Penn Discourse …pdtb2012/assets/... · 2012. 5. 24. · “This is not active either” 1 “So we can contact you at” 2 “BUT I’m

36 Sara Tonelli

Conclusions

•  Pilot annotation of 60 Human-Human dialogues of spontaneous speech in Italian

Page 37: Annotation of Italian Dialogues following the Penn Discourse …pdtb2012/assets/... · 2012. 5. 24. · “This is not active either” 1 “So we can contact you at” 2 “BUT I’m

37 Sara Tonelli

Conclusions

•  Pilot annotation of 60 Human-Human dialogues of spontaneous speech in Italian

•  Some adaptations w.r.t. PDTB: –  Allow implicit relations between non-adjacent sentences –  Modify sense hierarchy in order to take into account the

pragmatic dimension of dialogues

Page 38: Annotation of Italian Dialogues following the Penn Discourse …pdtb2012/assets/... · 2012. 5. 24. · “This is not active either” 1 “So we can contact you at” 2 “BUT I’m

38 Sara Tonelli

Conclusions

•  Pilot annotation of 60 Human-Human dialogues of spontaneous speech in Italian

•  Some adaptations w.r.t. PDTB: –  Allow implicit relations between non-adjacent sentences –  Modify sense hierarchy in order to take into account the

pragmatic dimension of dialogues

•  No particular adaptation driven by the language

Page 39: Annotation of Italian Dialogues following the Penn Discourse …pdtb2012/assets/... · 2012. 5. 24. · “This is not active either” 1 “So we can contact you at” 2 “BUT I’m

39 Sara Tonelli

Conclusions

•  Pilot annotation of 60 Human-Human dialogues of spontaneous speech in Italian

•  Some adaptations w.r.t. PDTB: –  Allow implicit relations between non-adjacent sentences –  Modify sense hierarchy in order to take into account the

pragmatic dimension of dialogues

•  No particular adaptation driven by the language •  The specific domain and genre of the LUNA corpus affect the

relation types: temporal and causal relations are predominant, pragmatic relations are very frequent

•  Open issue: agreement

Page 40: Annotation of Italian Dialogues following the Penn Discourse …pdtb2012/assets/... · 2012. 5. 24. · “This is not active either” 1 “So we can contact you at” 2 “BUT I’m

40 Sara Tonelli

References

S. Tonelli, G. Riccardi, R. Prasad, A. Joshi, "Annotation of Discourse Relations for Conversational Spoken Dialogs" in Proceedings of the Seventh conference on International Language Resources and Evaluation: (LREC'10), Paris: ELRA, 2010, p. 2084-2090. - ISBN: 2951740867.

M. Dinarelli, S. Quarteroni, S. Tonelli, A. Moschitti, G. Riccardi, "Annotating Spoken Dialogs: From Speech Segments to Dialog Acts and Frame Semantics" in Proceedings of SRSL 2009, the 2nd Workshop on Semantic Representation of Spoken Language, Athens, Greece: Association for Computational Linguistics, 2009, p. 34-41.