Fall 2005
Lecture Notes #7
EECS 595 / LING 541 / SI 661
Natural Language Processing
Natural Language Generation
What is NLG?
• Mapping meaning to text
• Stages:– Content selection– Lexical selection– Sentence structure: aggregation, referring
expressions– Discourse structure
Acrobat Document
Systemic grammars
• Language is viewed as a resource for expressing meaning in context (Halliday, 1985)
• Layers: mood, transitivity, theme
The system will save the document
Mood subject finite predicator object
Transitivity actor process goal
Theme theme rheme
Acrobat Document
Example
(:process save-1:actor system-1:goal document-1:speechact assertion:tense future
) Input is underspecified
The Functional Unification Formalism (FUF)
• Based on Kay’s (83) formalism
• partial information, declarative, uniform, compact
• same framework used for all stages: syntactic realization, lexicalization, and text planning
Functional analysis
• Functional vs. structured analysis
• “John eats an apple”
• actor (John), affected (apple), process (eat)
• NP VP NP
• suitable for generation
Partial vs. complete specification
• Voice: An apple is eaten by John
• Tense: John ate an apple
• Mode: Did John ear an apple?
• Modality: John must eat an apple
• prolog: p(X,b,c)
action = eat
actor = John
object = apple
Unification
• Target sentence
• input FD
• grammar
• unification process
• linearization process
Sample input
((cat s) (prot ((n ((lex john))))) (verb ((v ((lex like))))) (goal ((n ((lex mary))))))
Sample grammar((alt top (((cat s) (prot ((cat np))) (goal ((cat np))) (verb ((cat vp) (number {prot number}))) (pattern (prot verb goal))) ((cat np) (n ((cat noun) (number {^ ^ number}))) (alt (((proper yes) (pattern (n))) ((proper no) (pattern (det n)) (det ((cat article) (lex “the”))))))) ((cat vp) (pattern (v)) (v ((cat verb)))) ((cat noun)) ((cat verb)) ((cat article)))))
Sample output((cat s) (goal ((cat np) (n ((cat noun) (lex mary) (number {goal number}))) (pattern (n)) (proper yes))) (pattern (prot verb goal)) (prot ((cat np) (n ((cat noun) (lex john) (number {verb number}))) (number {verb number}) (pattern (n)) (proper yes))) (verb ((cat vp) (pattern (v)) (v ((cat verb) (lex like))))))
Comparison with Prolog
• Similarities:– both have unification at the core
– Prolog program = FUF grammar
– Prolog query = FUF input
• Differences:– Prolog: first order term unification
– FUF: arbitrarily rooted directed graphs are unified
The SURGE grammar• Syntactic realization front-end
• variable level of abstraction
• 5600 branches and 1600 alts
Lexicalchooser
SURGELinearizerMorphology
Lexicalized FD Syntactic FD
Text
Systems developed using FUF/SURGE
• COMET
• MAGIC
• ZEDDOC
• PLANDOC
• FLOWDOC
• SUMMONS
CFUF
• Fast implementation by Mark Kharitonov (C++)
• Up to 100 times faster than Lisp/FUF
• Speedup higher for larger inputs
References
• Cole, Mariani, Uszkoreit, Zaenen, Zue (eds.) Survey of the State of the Art in Human Language Technology, 1995
• Elhadad, Using Argumentation to Control Lexical Choice: A Functional Unification Implementation, 1993
• Elhadad, FUF: the Universal Unifier, User Manual, 1993
• Elhadad and Robin, SURGE: a Comprehensive Plug-in Syntactic Realization Component for Text Generation, 1999
• Kharitonov, CFUF: A Fast Interpreter for the Functional Unification Formalism, 1999
• Radev, Language Reuse and Regeneration: Generating Natural Language Summaries from Multiple On-Line Sources, Department of Computer Science, Columbia University, October 1998
Path notation
• You can view a FD as a tree• To specify features, you can use a path
– {feature feature … feature} value
– e.g. {prot number}
• You can also use relative paths– {^ number} value => the feature number for the current
node
– {^ ^ number} value => the feature number for the node above the current node
Sample grammar((alt top (((cat s) (prot ((cat np))) (goal ((cat np))) (verb ((cat vp) (number {prot number}))) (pattern (prot verb goal))) ((cat np) (n ((cat noun) (number {^ ^ number}))) (alt (((proper yes) (pattern (n))) ((proper no) (pattern (det n)) (det ((cat article) (lex “the”))))))) ((cat vp) (pattern (v)) (v ((cat verb)))) ((cat noun)) ((cat verb)) ((cat article)))))
Unification Example
Unify Prot
Unify Goal
Unify vp
Unify verb
Finish
Discourse Analysis
The problem
• Discourse• Monologue and Dialogue (dialog)• Human-computer interaction• Example: John went to Bill’s car dealership to
check out an Acura Integra. He looked at it for about half an hour.
• Example: I’d like to get from Boston to San Francisco, on either December 5th or December 6th. It’s okay if it stops in another city along the way.
Information extraction and discourse analysis
• Example: First Union Corp. is continuing to wrestle with severe problems unleashed by a botched merger and a troubled business strategy. According to industry insiders at Paine Webber, their president, John R. Georgius, is planning to retire by the end of the year.
• Problems with summarization and generation
Reference resolution
• The process of reference (associating “John” with “he”).
• Referring expressions and referents.
• Needed: discourse models
• Problem: many types of reference!
Example (from Webber 91)
• According to John, Bob bought Sue an Integra, and Sue bough Fred a legend.
• But that turned out to be a lie. - referent is a speech act.
• But that was false. - proposition• That struck me as a funny way to describe the
situation. - manner of description• That caused Sue to become rather poor. - event• That caused them both to become rather poor. -
combination of several events.
Reference phenomena
• Indefinite noun phrases: I saw an Acura Integra today.
• Definite noun phrases: The Integra was white.• Pronouns: It was white.• Demonstratives: this Acura.• Inferrables: I almost bought an Acura Integra today,
but a door had a dent and the engine seemed noisy.• Mix the flour, butter, and water. Kneed the dough
until smooth and shiny.
Constraints on coreference
• Number agreement: John has an Acura. It is red.• Person and case agreement: (*) John and Mary have
Acuras. We love them (where We=John and Mary)• Gender agreement: John has an Acura. He/it/she is
attractive.• Syntactic constraints:
– John bought himself a new Acura.– John bought him a new Acura.– John told Bill to buy him a new Acura.– John told Bill to buy himself a new Acura– He told Bill to buy John a new Acura.
Preferences in pronoun interpretation
• Recency: John has an Integra. Bill has a Legend. Mary likes to drive it.
• Grammatical role: John went to the Acura dealership with Bill. He bought an Integra.
• (?) John and Bill went to the Acura dealership. He bought an Integra.
• Repeated mention: John needed a car to go to his new job. He decided that he wanted something sporty. Bill went to the Acura dealership with him. He bought an Integra.
Preferences in pronoun interpretation
• Parallelism: Mary went with Sue to the Acura dealership. Sally went with her to the Mazda dealership.
• ??? Mary went with Sue to the Acura dealership. Sally told her not to buy anything.
• Verb semantics: John telephoned Bill. He lost his pamphlet on Acuras. John criticized Bill. He lost his pamphlet on Acuras.
An algorithm for pronoun resolution
• Two steps: discourse model update and pronoun resolution.
• Salience values are introduced when a noun phrase that evokes a new entity is encountered.
• Salience factors: set empirically.
Salience weights in Lappin and Leass
Sentence recency 100
Subject emphasis 80
Existential emphasis 70
Accusative emphasis 50
Indirect object and oblique complement emphasis
40
Non-adverbial emphasis 50
Head noun emphasis 80
Lappin and Leass (cont’d)
• Recency: weights are cut in half after each sentence is processed.
• Examples:– An Acura Integra is parked in the lot. (subject)– There is an Acura Integra parked in the lot. (existential
predicate nominal)– John parked an Acura Integra in the lot. (object)– John gave Susan an Acura Integra. (indirect object)– In his Acura Integra, John showed Susan his new CD
player. (demarcated adverbial PP)
Algorithm
1. Collect the potential referents (up to four sentences back).2. Remove potential referents that do not agree in number or
gender with the pronoun.3. Remove potential referents that do not pass intrasentential
syntactic coreference constraints.4. Compute the total salience value of the referent by adding
any applicable values for role parallelism (+35) or cataphora (-175).
5. Select the referent with the highest salience value. In case of a tie, select the closest referent in terms of string position.
Example
• John saw a beautiful Acura Integra at the dealership last week. He showed it to Bill. He bought it.
Rec Subj Exist ObjIndObj
NonAdv
HeadN Total
John 100 80 50 80 310
Integra 100 50 50 80 280
dealership 100 50 80 230
Example (cont’d)
Referent Phrases Value
John {John} 155
Integra {a beautiful Acura Integra} 140
dealership {the dealership} 115
Example (cont’d)
Referent Phrases Value
John {John, he1} 465
Integra {a beautiful Acura Integra} 140
dealership {the dealership} 115
Example (cont’d)
Referent Phrases Value
John {John, he1} 465
Integra {a beautiful Acura Integra, it} 420
dealership {the dealership} 115
Example (cont’d)
Referent Phrases Value
John {John, he1} 465
Integra {a beautiful Acura Integra, it} 420
Bill {Bill} 270
dealership {the dealership} 115
Example (cont’d)
Referent Phrases Value
John {John, he1} 232.5
Integra {a beautiful Acura Integra, it1} 210
Bill {Bill} 135
dealership {the dealership} 57.5
Observations
• Lappin & Leass - tested on computer manuals - 86% accuracy on unseen data.
• Centering (Grosz, Josh, Weinstein): additional concept of a “center” – at any time in discourse, an entity is centered.
• Backwards looking center; forward looking centers (a set).
• Centering has not been automatically tested on actual data.
Discourse structure
• (*) Bill went to see his mother. The trunk is what makes the bonsai, it gives it both its grace and power.
• Coherence principle:– John hid Bill’s car keys. He was drunk– ?? John hid Bill’s car keys. He likes spinach
• Rhetorical Structure Theory (Mann, Matthiessen, and Thompson)
Sample rhetorical relations
Relation Nucleus Satellite
Antithesis ideas favored by the author
ideas disfavored by the author
Background text whose understanding is being facilitated
text for facilitating understanding
Concession situation affirmed by author
situation which is apparently inconsistent but also affirmed by author
Elaboration basic information additional information
Purpose an intended situation the intent behind the situation
Restatement a situation a reexpression of the situation
Summary Text a short summary of that text
Example (from MMT)1) Title: Bouquets in a basket - with living flowers
2) There is a gardening revolution going on.
3) People are planting flower baskets with living plants,
4) mixing many types in one container for a full summer of floral beauty.
5) To create your own "Victorian" bouquet of flowers,
6) choose varying shapes, sizes and forms, besides a variety of complementary colors.
7) Plants that grow tall should be surrounded by smaller ones and filled with others that tumble over the side of a hanging basket.
8) Leaf textures and colors will also be important.
9) There is the silver-white foliage of dusty miller, the feathery threads of lotus vine floating down from above, the deep greens, or chartreuse, even the widely varied foliage colors of the coleus.
Christian Science Monitor, April, 1983
Example (cont’d)
Cross-document structure
Number Relationship type Level Description
1 Identity Any The same text appears in more than one location
2 Equivalence (paraphrasing) S, D Two text spans have the same information content
3 Translation P, S Same information content in different languages
4 Subsumption S, D One sentence contains more information than another
5 Contradiction S, D Conflicting information 6 Historical background S Information that puts current
information in context 7 Cross-reference P The same entity is mentioned 8 Citation S, D One sentence cites another document 9 Modality S Qualified version of a sentence
10 Attribution S One sentence repeats the information of another while adding an attribution
11 Summary S, D Similar to Summary in RST: one sentence summarizes another
Number Relationship type Level Description
12 Follow-up S Additional information which reflects facts that have happened since the previous account
13 Elaboration S Additional information that wasn’t included in the last account
14 Indirect speech S Shift from direct to indirect speech or vite-versa
15 Refinement S Additional information that is 16 Agreement S One source expresses agreement with
another 17 Judgement S A qualified account of a fact 18 Fulfilment S A prediction turned true 19 Description S Insertion of a description 20 Reader profile S Style and background-specific change 21 Contrast S Contrasting two accounts of facts 22 Parallel S Comparing two accounts of facts 23 Generalization S Generalization 24 Change of perspective S,D The same source presents a fact in a
different light
Top Related