Measuring the Influence of Errors Induced by the Presence of Dialogs in Reference Clustering of...

19
Measuring the Influence of Errors Induced by the Presence of Dialogs in Reference Clustering of Narrative Text Alaukik Aggarwal, Department of Computer Science and Engineering, MAIT Pablo Gervás, Instituto de Technologia del Concimiento, Universidad Complutense de Madrid Raquel Hervás, Instituto de Technologia del Concimiento, Universidad Complutense de Madrid

Transcript of Measuring the Influence of Errors Induced by the Presence of Dialogs in Reference Clustering of...

Page 1: Measuring the Influence of Errors Induced by the Presence of Dialogs in Reference Clustering of Narrative Text Alaukik Aggarwal, Department of Computer.

Measuring the Influence of Errors Induced by the Presence of Dialogs in

Reference Clustering of Narrative Text

Alaukik Aggarwal, Department of Computer Science and Engineering, MAITPablo Gervás, Instituto de Technologia del Concimiento, Universidad Complutense de MadridRaquel Hervás, Instituto de Technologia del Concimiento, Universidad Complutense de Madrid

Raquel Hervás Ballesteros
I have added in the notes of this slide general considerations for the presentation...
Page 2: Measuring the Influence of Errors Induced by the Presence of Dialogs in Reference Clustering of Narrative Text Alaukik Aggarwal, Department of Computer.

Outline of the Problem

•Coreference Resolution = Anaphoric + Non-anaphoric

•Different genres of text studied:▫Text without dialogues (like news articles)▫Text consisting only of dialogues

(conversations)

2

Alaukik Aggarwal
In general, specific genres of text like news articles, questions, informational dialogs are studied. These lack Direct Conversation!
Raquel Hervás Ballesteros
Maybe you can change the title of this slide so it reflects more that we are going to do something related with dialogs. What about "Dialogs in Coreference Resolution"?
Page 3: Measuring the Influence of Errors Induced by the Presence of Dialogs in Reference Clustering of Narrative Text Alaukik Aggarwal, Department of Computer.

An Example

•Sachin Tendulkar has been honoured with Padma Vibhushan Award. India’s world number one batsman secured 17,000 runs on home soil. Tendulkar has put India in a strong position against Australia in the One-Day Series. The Indian responded to his critics who believed that his career was sliding with his 40th century.

Generally the kind of text found in News Articles.

3

Alaukik Aggarwal
Example of a news article. Lacks rich direct conversation.Sachin Tendulkar -> World Number one -> Tendulkar -> Indian -> his -> hisIndia's -> home soil -> India
Alaukik Aggarwal
For comparison between text w/o dialogue and with dialogue, I would like to present an example:
Page 4: Measuring the Influence of Errors Induced by the Presence of Dialogs in Reference Clustering of Narrative Text Alaukik Aggarwal, Department of Computer.

Problems in Dialogue - Why?

•Pronominal Reference within quoted fragments

•Change in referential value of demonstratives▫“You take these bags and I’ll take those”

•Non-NP antecedents or no antecedents at all

4

Alaukik Aggarwal
These are the kinds of problems usually faced in dialogues.The examples of 1st and third are demonstrated in the example in next slide.
Page 5: Measuring the Influence of Errors Induced by the Presence of Dialogs in Reference Clustering of Narrative Text Alaukik Aggarwal, Department of Computer.

Coreference in Narrative

•Contain many characters and objects

•Rich in dialogues and coreferences

•Cover different style of writing from different authors and time periods

5

Page 6: Measuring the Influence of Errors Induced by the Presence of Dialogs in Reference Clustering of Narrative Text Alaukik Aggarwal, Department of Computer.

Another Example• The two elder sons did not delay but set off at

once, and the third and youngest son began pleading. "No, my son, you mustn't leave me, an old man, all alone," said the king. "Please let me go, Father! I do so want to travel over the world and find my mother." The king reasoned with him, but, seeing that he could not stop him from going, said: "Oh, all right then, I suppose it can't be helped. Go and God be with you!"

An excerpt from Three Kingdoms (by Alexander Afanasiev )

6

Alaukik Aggarwal
First dialog represents the problem of pronominal reference inside dialogues:It may be possible to chain my -> me -> an old man to 'King', however it is not possible to identify that 'you' points to 'yougest son' (you-> youngest son).In the last line, 'it' is an example of anaphora having no antecedents at all.
Raquel Hervás Ballesteros
It is not completely true that we have chosen tales because of that... It is more that they are a kind of narrative texts that are rich in the kind of dialogs we want to study.
Alaukik Aggarwal
Reson for choosing tales is that they have various characters involved and are rich in coreferences, including dialogues.
Page 7: Measuring the Influence of Errors Induced by the Presence of Dialogs in Reference Clustering of Narrative Text Alaukik Aggarwal, Department of Computer.

Quantitatively Analyzing the Presence of Dialogs in Narrative Texts

7

Page 8: Measuring the Influence of Errors Induced by the Presence of Dialogs in Reference Clustering of Narrative Text Alaukik Aggarwal, Department of Computer.

Resolving Coreference in NPs

•Knowledge-rich and Knowledge-poor

•Different approaches considered by us:▫Decision trees▫C4.5 Machine Learning algorithm▫Clustering▫Hybrid

8

Alaukik Aggarwal
1. Tell about two different kinds of approaches exist. Rich and Poor and explain them.2. The approaches we studied - Decision Trees, C4.5 Learning algorithm and Clustering approach and why we followed this particular approach?
Page 9: Measuring the Influence of Errors Induced by the Presence of Dialogs in Reference Clustering of Narrative Text Alaukik Aggarwal, Department of Computer.

Corpus of narrative texts

•Thirty folk tales in English• Different styles, authors and time periods• Rich in dialogs between characters

•Process:▫Identify references▫Enrich references with semantic

information▫Coreference resolution using a clustering

approach

9

Page 10: Measuring the Influence of Errors Induced by the Presence of Dialogs in Reference Clustering of Narrative Text Alaukik Aggarwal, Department of Computer.

Step 1: Identifying References

•GATE (General Architecture for Text Engineering)▫Annie Sentence Splitter▫Annie English Tokeniser▫Annie POS Tagger▫CREOLE plugin

•Output in XML format

10

Raquel Hervás Ballesteros
Explain in this slide that we have considered as references all the nominal phrases, and that we have used all these things for identifying themMaybe you can say also something about the information in the XML output, although I would not put an example here because they are not very clear.
Page 11: Measuring the Influence of Errors Induced by the Presence of Dialogs in Reference Clustering of Narrative Text Alaukik Aggarwal, Department of Computer.

Step 2: Feature Extraction

•Position•Part of Speech (POS)•Article•Number•Semantic Class

▫WordNet (sysnets)•Gender

▫A resource of Gender data

11

Page 12: Measuring the Influence of Errors Induced by the Presence of Dialogs in Reference Clustering of Narrative Text Alaukik Aggarwal, Department of Computer.

Annotated Data

12

Page 13: Measuring the Influence of Errors Induced by the Presence of Dialogs in Reference Clustering of Narrative Text Alaukik Aggarwal, Department of Computer.

Step 3: Algorithm and Working

•Based on the clustering algorithm by (Cardie and Wagstaff, 1999)

•dist(NPi, NPj) = ∑ wf * incompatibility (Npi, NPj)

f Є F

•Feature (f) - Position, Pronoun, Article, Word-substring, Number, Semantic Class, Gender

13

Page 14: Measuring the Influence of Errors Induced by the Presence of Dialogs in Reference Clustering of Narrative Text Alaukik Aggarwal, Department of Computer.

Evaluation and Results

14

Raquel Hervás Ballesteros
I have added this slide as a equivalent to the other one that marks the beginning of our work.
Page 15: Measuring the Influence of Errors Induced by the Presence of Dialogs in Reference Clustering of Narrative Text Alaukik Aggarwal, Department of Computer.

Evaluation

•Clustering algorithm over the tales twice▫With dialogs▫Without dialogs

•Hand correction of the obtained coreferences for comparison▫Precision and recall

15

Page 16: Measuring the Influence of Errors Induced by the Presence of Dialogs in Reference Clustering of Narrative Text Alaukik Aggarwal, Department of Computer.

Results

•Precision and Recall Results with and without dialogues:

Precision Recall

With Dialogue 61.10 56.57

Without Dialogue

70.49 63.15

Radius With Dialogues

Without Dialogues

10 36.81 50.93 41.95 62.69

20 53.77 59.26 57.01 66.77

31 56.57 61.10 63.15 70.49

16

Alaukik Aggarwal
Explain what 'without dialog' corpus consisted of?And also that the optimal values were obtained at r=31.
Page 17: Measuring the Influence of Errors Induced by the Presence of Dialogs in Reference Clustering of Narrative Text Alaukik Aggarwal, Department of Computer.

Conclusions

•Nested dialogues decrease the efficiency by 9% in Precision and 7% in Recall

•But information lost if dialogues are removed▫Dialogs need to be treated separately

•In addition, constructed a corpus of tales annotated with coreference information for nominal phrases

17

Alaukik Aggarwal
Remind people about that our goal was to check if coreference algorithms must deal with nested dialogs separately, and that our results show that they have to.
Page 18: Measuring the Influence of Errors Induced by the Presence of Dialogs in Reference Clustering of Narrative Text Alaukik Aggarwal, Department of Computer.

Future work

•Dialogs could be extracted from the tale, and considered as a separated text▫Information about the characters involved

is required

•Possible improvements in different problems▫Word Sense Disambiguation▫Named Entity Recognition

18

Page 19: Measuring the Influence of Errors Induced by the Presence of Dialogs in Reference Clustering of Narrative Text Alaukik Aggarwal, Department of Computer.

Thank You.

19