Unsupervised Learning of Narrative Event Chains Original paper by: Nate Chambers and Dan Jurafsky in...

Unsupervised Learning of Narrative Event Chains

Original paper by: Nate Chambers and Dan Jurafsky

in ACL 2008This presentation for discussion created by:

Peter Clark (Jan 2009)Disclaimer: these slides are by Peter Clark, not the original

authors, and thus represent a (possibly flawed) interpretation of the original work!

Why Scripts?

Essential for making sense of text We typically match a narrative with expected “scripts”

to make sense of what’s happening to fill in the gaps, fill in goals and purpose

On November 26, the Japanese attack fleet of 33 warships and auxiliary craft, including 6 aircraft carriers, sailed from northern Japan for the Hawaiian Islands. It followed a route that took it far to the north of the normal shipping lanes. By early morning, December 7, 1941, the ships had reached their launch position, 230 miles north of Oahu.

depart → travel → arrive

Scripts Important/essential for NLP But: expensive to build

Can we learn them from text?

“John entered the restaurant. He sat down, and ordered a meal. He ate…”

enter

sit

order

eat

?

Our own (brief) attempt Look at next events in 1GB corpus:

"shoot“ is followed by:("say" 121)("be" 110)("shoot" 103)("wound" 58)("kill" 30)("die" 27)("have" 23)("tell" 23)("fire" 15)("refuse" 15)("go" 13)("think" 13)("carry" 12)("take" 12)("come" 11)("help" 10)("run" 10)("be arrested" 9)("find" 9)

"drive" is followed by:("drive" 364)("be" 354)("say" 343)("have" 71)("continue" 47)("see" 40)("take" 32)("make" 29)("expect" 27)("go" 24)("show" 22)("try" 19)("tell" 18)("think" 18)("allow" 16)("want" 15)("come" 13)("look" 13)("close" 12)

"fly“ is followed by:("fly" 362)("say" 223)("be" 179)("have" 60)("expect" 48)("allow" 40)("tell" 33)("see" 30)("go" 27)("take" 27)("make" 26)("plan" 24)("derive" 21)("want" 19)("schedule" 17)("report" 16)("declare" 15)("give" 15)("leave on" 15)

Some glimmers of hope, but not great…

Andrew Gordon (2007)

From [email protected] Thu Sep 27 09:33:04 2007

…Recently I tried to apply language modelingtechniques over event sequences in a billion words of narrative text extracted from Internet weblogs, and barely exceeded chance performance on some event-ordering evaluations….

Chambers and Jurafsky Main insight:

Don’t look at all verbs, just look at those mentioning the “key player” – the protagonist – in the sequence

Capture some role relationships also: Not just “push” → “fall”, but “push X” → “X fall”

“An automatically learned Prosecution Chain. Arrows indicate the before relation.”

Approach

Stage 1: find likelihood that one event+protagonist goes with

another (or more) event+protagonist NOTE: no ordering info e.g., given:

“X pleaded”, what other event+protagonist occur with unusually high frequencies?

→ “sentenced X”, “fined X”, “fired X”

Stage 2: order the set of event+protagonist

The Training Data

Articles in the GigaWord corpus For each article:

find all pairs of events (verbs) which have a shared argument shared argument found by OpenNLP coreference includes transitivity (X = Y, Y = Z, → X = Z)

add each pair to the database

events about John: {X enter, X sat, greet X} events about the waiter: {X come, X greet}

“John entered the restaurant. The waiter came over. John sat down, and the waiter greeted him….

X enter, X satX enter, greet XX sat, greet XX come, X greet

database ofpairs cooccurring

in the article

Stage 1

Given two events with a shared protagonist, do they occur “unusually often” in a corpus?

=

number of times “push” and “fall” have been seen with these corefererring arguments

number of times any pair of verbs have been seen with any coreferring arguments

probability of seeing “push” and “fall” with particular coreferring arguments

“push X” & “X fall”

Prob(“X event1” AND “X event2”) = Number(“X event1” AND “X event2”)

Sumij Number(“X eventi” AND “X eventj”)

more generally:…

PMI(“X event1”, “X event2”) = logProb(“X event1” AND “X event2”)

Prob (“X event1”) Prob(“X event2”)

= the “surprisingness” that the arg of event1

and event2 are coreferential

PMI (“surprisingness”):…

Can generalize: PMI: given an event (+ arg), how “unusual” is it to see

another event (+ same arg)? Generalization: given N events (+ arg), how “unusual” to

see another event (+ same arg)?

Thus:

set

Evaluation: Cloze test

Fill in the blank…

McCann threw two interceptions early. Toledo pulled McCann aside and told him he’d start. McCann quickly completed his first two passes.

X throwpull Xtell X

X startX complete

?pull Xtell X

X startX complete

(note: a set, not list)

Cloze task: predict “?”

Results:

69 articles, with >=5 protagonist+event in them System produces ~9000 guesses at each “?”

Learning temporal ordering Stage 1: add labels to corpus

Given: verb features (neighboring POS tags, neighboring axuiliaries and modals, WordNet synsets, etc.)

Assign: tense, grammatical aspect, aspectual class [Aside: couldn’t a parser assign this directly?]

Using: SVM, trained on labeled data (TimeBank corpus)

Stage 2: learn before() classifier Given: 2 events in a document sharing an argument Assign: before() relation Using: SVM, trained on labeled data (TimeBank

expanded with transitivity rule “X before Y and Y before Z → X before Z”) A variety of features used, including whether e1 grammatically

occurs before e2 in the text

Learning temporal ordering (cont)

Stage 3: For all event pairs with shared arg in the main corpus

e.g., “push X”, “X fall”

count the number of before(e1,e2) vs. before(e2,e1) classifications, to get an overall ordering confidence

Evaluation

Test set: use same 69 documents minus 6 which had no ordered events

Task: for each documenta. manually label the before() relations

b. generate a random ordering Can system distinguish real from random order?

“Coherence” ≈ sum of confidences of before() labels on all event pairs in document

Confidence(e1→e2) = log(#before(e1,e2) - #before(e2,e1)

# event+shared arg in doc:

Not that impressive (?)

Agglomeration and scripts

How do we get scripts? Could take a verb+arg, e.g., “arrest X” Then look for the most likely 2nd verb+arg, eg “charge X” Then the next most likely verb+arg, given these 2, eg

“indict X” etc.

Then: use ordering algorithm to produce ordering

{arrest X}↓

{arrest X, charge X}↓

{arrest X, charge X, indict X}↓…

“Good” examples…

“Prosecution” (This was the initial seed. Agglomeration was stopped arbitrarily after 10 events, or when a cutoff for node inclusion was reached (whichever was first)).

Good examples…

“Employment”(dotted lines are incorrect “before” relations)

Nate Chambers’ suggested mode of use: Given a set of events in a news article Predict/fill in the missing events

→ Do we really need scripts?

Many ways of referring to the same entity…

Less common style:

More common style:Nagumo's fleet assembled in the remote anchorage of Tankan Bay in the Kurile Islands and departed in strictest secrecy for Hawaii on 26 November 1941. The ships' route crossed the North Pacific and avoided normal shipping lanes. At dawn 7 December 1941, the Japanese task force had approached undetected to a point slightly more than 200 miles north of Oahu.

John went to a restaurant. John sat down. John ate. He paid…

Generally, there are a lot of entities doing a lot of things!

From [email protected] Tue Dec 16 12:48:58 2008…Even with the protagonist idea, it is still difficult to name the protagonist himself as many different terms are used. Naming the other non-protagonist roles is even more sparse. I'm experiencing the same difficulties. My personal thought is that we should not aim to fill the role with one term, but a set of weighted terms. This may be a set of related nouns, or even a set of unrelated nouns with their own preference weights.

Also: many ways of describing the same event!

Different levels of detail, different viewpoints: The planes destroyed the ships The planes dropped bombs, which destroyed the ships The bombs exploded, destroying the ships The Japanese destroyed the ships

Different granularities: Planes attacked Two waves of planes attacked 353 dive-bombers and torpedo planes attacked

Summary

Exciting work! simple but brilliant insight of “protagonist”

But is really only a first step towards scripts

mainly learns verb+arg co-associations in a text temporal ordering and agglomeration is a post-processing step quality of learned results still questionable

Cloze: needs >1000 guesses before hitting a mentioned, co-associated verb+arg

nice “Prosecution” script: a special case as most verbs in script are necessarily specific to Prosecution?

fluidity of language use (multiple ways of viewing same scene, multiple ways of referring to same entity) still a challenge

maybe don’t need to reify scripts (?) fill in missing (implied) events on the fly in context-sensitive way

Unsupervised Learning of Narrative Event Chains Original paper by: Nate Chambers and Dan Jurafsky in...

Documents

Transcript of Unsupervised Learning of Narrative Event Chains Original paper by: Nate Chambers and Dan Jurafsky in...