Automatic Summarization: A Tutorial Presented at RANLP’2003 Inderjeet Mani Georgetown University...
-
Upload
darnell-bratton -
Category
Documents
-
view
220 -
download
0
Transcript of Automatic Summarization: A Tutorial Presented at RANLP’2003 Inderjeet Mani Georgetown University...
Automatic Summarization:A Tutorial Presented at RANLP’2003
Inderjeet Mani
Georgetown University
Tuesday, September 9, 20032-5:30 pm
@georgetown.edu
complingone.georgetown.edu/~linguist/inderjeet.html
RANLP’2003Page 2
Copyright © 2003 Inderjeet Mani. All rights reserved.
AGENDA
14:10 pm I. Fundamentals (Definitions,
Human Abstracting, Abstract Architecture)
14:40 II. Extraction (Shallow Features, Revision,
Corpus-Based Methods)
15:30 Break
16: 00 III. Abstraction (Template and Concept-Based)
16:30 IV. Evaluation
17:00 pm V. Research Areas
Multi-document, Multimedia, Multilingual
Summarization
17:30 pm Conclusion
RANLP’2003Page 3
Copyright © 2003 Inderjeet Mani. All rights reserved.
Human Summarization is all around us
Headlines newspapers, Headline NewsTable of contents of a book, magazine, etc.Preview of a movieDigest TV or cinema guideHighlights meeting dialogue, email trafficAbstract summary of a scientific paperBulletin weather forecast, stock market, ...Biography resume, obituary, tombstoneAbridgment Shakespeare for kidsReview of a book, a CD, play, etc.Scale-downs maps, thumbnailsSound bite/video clip from speech, conversation, trial
RANLP’2003Page 4
Copyright © 2003 Inderjeet Mani. All rights reserved.
Current Applications
Multimedia news summaries: watch the news and tell me what happened while I was away
Physicians' aids: summarize and compare the recommended treatments for this patient
Meeting summarization: find out what happened at that teleconference I missed
Search engine hits: summarize the information in hit lists retrieved by search engines
Intelligence gathering: create a 500-word biography of Osama bin Laden
Hand-held devices: create a screen-sized summary of a book
Aids for the Handicapped: compact the text and read it out for a blind person
RANLP’2003Page 5
Copyright © 2003 Inderjeet Mani. All rights reserved.
RANLP’2003Page 6
Copyright © 2003 Inderjeet Mani. All rights reserved.
Example BIOGEN Biographies
Vernon Jordan is a presidential friend and a Clinton adviser. He is 63 years old. He helped Ms. Lewinsky find a job. He testified that Ms. Monica Lewinsky said that she had conversations with the president, that she talked to the president. He has numerous acquaintances, including Susan Collins, Betty Currie, Pete Domenici, Bob Graham, James Jeffords and Linda Tripp.
Henry Hyde is a Republican chairman of House Judiciary Committee and a prosecutor in Senate impeachment trial. He will lead the Judiciary Committee's impeachment review. Hyde urged his colleagues to heed their consciences , “the voice that whispers in our ear , ‘duty, duty, duty.’”.
Victor Polay is the Tupac Amaru rebels' top leader, founder and the organization's commander-and-chief. He was arrested again in 1992 and is serving a life sentence. His associates include Alberto Fujimori, Tupac Amaru Revolutionary, and Nestor Cerpa.
RANLP’2003Page 7
Copyright © 2003 Inderjeet Mani. All rights reserved.
Columbia University’s Newsblaster
www.cs.columbia.edu/nlp/newsblaster/summaries/11_03_02_5.html
RANLP’2003Page 8
Copyright © 2003 Inderjeet Mani. All rights reserved.
Michigan’s MEAD
RANLP’2003Page 9
Copyright © 2003 Inderjeet Mani. All rights reserved.
Terms and Definitions
Text Summarization
- The process of distilling the most important information from a source (or sources) to produce an abridged version for a particular user (or users) and task (or tasks).
Extract vs. Abstract
- An extract is a summary consisting entirely of material copied from the input
- An abstract is a summary at least some of whose material is not present in the input, e.g., subject categories, paraphrase of content, etc.
RANLP’2003Page 10
Copyright © 2003 Inderjeet Mani. All rights reserved.
Illustration of Extracts and Abstracts
25 Percent Extract of Gettysburg Address (sents 1, 2, 6)
Fourscore and seven years ago our fathers brought forth upon this continent a new nation, conceived in liberty, and dedicated to the proposition that all men are created equal. Now we are engaged in a great civil war, testing whether that nation, or any nation so conceived and so dedicated, can long endure. The brave men, living and dead, who struggled here, have consecrated it far above our poor power to add or detract.
10 Percent Extract (sent 2}
Now we are engaged in a great civil war, testing whether that nation, or any nation so conceived and so dedicated, can long endure.
15 Percent Abstract
This speech by Abraham Lincoln commemorates soldiers who laid down their lives in the Battle of Gettysburg. It offers an eloquent reminder to the troops that it is the future of freedom in America that they are fighting for.
RANLP’2003Page 11
Copyright © 2003 Inderjeet Mani. All rights reserved.
Illustration of the power of human abstracts
President Calvin Coolidge, Grace Coolidge, and dog, Rob Roy, c.1925. Plymouth Notch, Vermont.
Mrs. Coolidge: What did the preacher discuss in his sermon?President Coolidge: Sin. Mrs. Coolidge: What did he say?President Coolidge: He said he was against it.
- Bartlett’s Quotations (via Graeme Hirst)
RANLP’2003Page 12
Copyright © 2003 Inderjeet Mani. All rights reserved.
Summary Function
Indicative summaries- An indicative abstract provides a reference function for
selecting documents for more in-depth reading.
Informative summaries- An informative abstract covers all the salient information in
the source at some level of detail.Evaluative summaries- A critical abstract evaluates the subject matter of the source,
expressing the abstractor's views on the quality of the work of the author
The indicative/informative distinction is a prescriptive distinction, intended to guide professional abstractors (e.g., ANSI 1996).
The indicative/informative distinction is a prescriptive distinction, intended to guide professional abstractors (e.g., ANSI 1996).
Indicative
Informative
evaluative
RANLP’2003Page 13
Copyright © 2003 Inderjeet Mani. All rights reserved.
User-Oriented Summary Types
Generic summaries
- aimed at a particular - usually broad - readership community
Tailored summaries (aka user-focused, topic-focused, query-focused
summaries)
- tailored to the requirements of a particular user or group of users.
- User’s interests: full-blown user models profiles recording subject area terms a specific query.
- A user-focused summary needs, of course, to take into account the influence of the user as well as the content of the document.
A user-focused summarizer usually includes a parameter to influence this weighting.
RANLP’2003Page 14
Copyright © 2003 Inderjeet Mani. All rights reserved.
Summarization Architecture
Summaries
Audience Function
TypeExtract
Abstract
CharacteristicsSpan
SourceGenreMedia
Language
Coherence
Compression
Analysis Transformation Synthesis
RANLP’2003Page 15
Copyright © 2003 Inderjeet Mani. All rights reserved.
Characteristics of Summaries
Reduction of information content
- Compression Rate, also known as condensation rate, reduction rate
Measured by summary length / source length ( 0 < c < 100)
- Target Length
Informativeness
- Fidelity to Source
- Relevance to User’s Interests
Well-formedness/Coherence
- Syntactic and discourse-level Extracts: need to avoid gaps, dangling anaphors, ravaged
tables, lists, etc. Abstracts: need to produce grammatical, plausible output
RANLP’2003Page 16
Copyright © 2003 Inderjeet Mani. All rights reserved.
Relation of Summarization to Other Tasks
Similarities Differences
DocumentRetrieval &Filtering
relevance;extraction as passageretrieval
condensation rate isn't aparameter (although outputmay avail of summarization)
TextMining
discovery procedure(multi-source summ.)
condensation rate isn't aparameter
InformationExtraction
only if doc is mainlyabout extracted info
condensation rate isn't aparameter (though it could beone); condensation rate isn'tapplicable if document isn'tabout template
TextCompression
leverages redundancyin a message tocondense it
for efficient storage andtransmission of information,not for human consumption
RANLP’2003Page 17
Copyright © 2003 Inderjeet Mani. All rights reserved.
One Text, Many Summaries(Evaluation preview)
25 Percent Leading Text Extract (first 3 sentences) - seems OK, too!
Four score and seven years ago our fathers brought forth upon this continent a new nation, conceived in liberty, and dedicated to the proposition that all men are created equal. Now we are engaged in a great civil war, testing whether that nation, or any nation so conceived and so dedicated, can long endure. We are met here on a great battlefield of that war.
15 Percent Synopsis by human (critical summary) - seems even better!
This speech by Abraham Lincoln commemorates soldiers who laid down their lives in the Battle of Gettysburg. It offers an eloquent reminder to the troops that it is the future of freedom in America that they are fighting for.
11 Percent Extract (by human, out of context) - is bad! (sents5, 8)
It is altogether fitting and proper that we should do this. The world will little note, nor long remember, what we say here, but can never forget what they did here.
We can usually tell when a summary is incoherent, but how do we evaluate summaries in general?
We can usually tell when a summary is incoherent, but how do we evaluate summaries in general?
RANLP’2003Page 18
Copyright © 2003 Inderjeet Mani. All rights reserved.
Studies of human summaries
Cremmins (1996) prescribed that abstractors- use surface features: headings, key phrases, position- use discourse features: overall text structure- revise and edit abstracts
Liddy (1991)- studied 276 abstracts structured in terms of background,
purpose, methodology, results and conclusions Endres-Niggemeyer et al. (1995, 1998) found abstractors
- use top-down strategy exploiting discourse structure- build topic sentences, use beginning/ends as relevant,
prefer top level segments, examine passages/paragraphs before individual sentences, exploit outlines, formatting ...
RANLP’2003Page 19
Copyright © 2003 Inderjeet Mani. All rights reserved.
Endres-Niggemeyer et al. (1995, 1998)
Abstractors never attempt to read the document from start to finish.
Instead, they use the structural organization of the document, including formatting and layout (the scheme) to skim the document for relevant passages, which are fitted together into a discourse-level representation (the theme).
This representation uses discourse-level rhetorical relations to link relevant text elements capturing what the document is about.
They use a top-down strategy, exploiting document structure, and examining paragraphs and passages before individual sentences.
The skimming for relevant passages exploits specific shallow features such as:
- cue phrases (especially in-text summaries)
- location of information in particular structural positions (beginning of the document, beginning and end of paragraphs)
- information from the title and headings.
RANLP’2003Page 20
Copyright © 2003 Inderjeet Mani. All rights reserved.
Stages of Abstracting: Cremmins (1996)
Cremmins recommends 12-20 mins to abstract an average scientific paper - much less time than it takes to really understand one.
Cremmins recommends 12-20 mins to abstract an average scientific paper - much less time than it takes to really understand one.
RANLP’2003Page 21
Copyright © 2003 Inderjeet Mani. All rights reserved.
Cremmins (1996) described two kinds of editing operations that abstractors carry out
- Local Revision - revises content within a sentence- Global Revision - revises content across sentences
Abstractors’ Editing Operations: Local Revision
drop vague orredundant terms
referenceadjustment
wordingprescriptions
contextual lexical choice
RANLP’2003Page 22
Copyright © 2003 Inderjeet Mani. All rights reserved.
AGENDA
14:10 pm I. Fundamentals (Definitions,
Human Abstracting, Abstract Architecture)
14:40 II. Extraction (Shallow Features, Revision,
Corpus-Based Methods)
15:30 Break
16: 00 III. Abstraction (Template and Concept-Based)
16:30 IV. Evaluation
17:00 pm V. Research Areas
Multi-document, Multimedia, Multilingual
Summarization
17:30 pm Conclusion
RANLP’2003Page 23
Copyright © 2003 Inderjeet Mani. All rights reserved.
Summarization Approaches
Shallower approaches- result in sentence extraction
sentences may/will be extracted out of context
synthesis here involves smoothing
» include window of previous sentences
» adjust references- can be trained using a corpus
Deeper approaches - result in abstracts- synthesis involves NL generation
can be partly trained using a corpus- requires some coding for a domain
RANLP’2003Page 24
Copyright © 2003 Inderjeet Mani. All rights reserved.
Some Features used in Sentence Extraction Summaries
Location: position of term in document, position in paragraph/section, section depth, particular sections (e.g., title, introduction, conclusion)
Thematic: presence of statistically salient terms (tf.idf)- these are document-specific
Fixed phrases: in-text summary cue phrases (“in summary”, “our investigation shows”, “the purpose of this article is”,..), emphasizers (“important”, “in particular”,...)
- these are genre-specific Cohesion: connectivity of text units based on proximity,
repetition and synonymy, coreference, vocabulary overlap Discourse Structure: rhetorical structure, topic structure,
document format
RANLP’2003Page 25
Copyright © 2003 Inderjeet Mani. All rights reserved.
Putting it Together: Linear Feature Combination
U is a text unit such as a sentence, Greek letters denote tuning parameters
Location Weight assigned to a text unit based on whether it occurs in initial, medial,
or final position in a paragraph or the entire document, or whether it occurs
in prominent sections such as the document’s intro or conclusion
FixedPhrase Weight assigned to a text unit in case fixed-phrase summary cues occur
ThematicTerm Weight assigned to a text unit due to the presence of thematic
terms (e.g., tf.idf terms) in that unit
AddTerm Weight assigned to a text unit for terms in it that are also present in the title,
headline, initial para, or the user’s profile or query
)( )(
)(*)(:)(
UAddTermUrmThematicTe
UeFixedPhrasULocationUWeight
RANLP’2003Page 26
Copyright © 2003 Inderjeet Mani. All rights reserved.
Shallow Approaches
Source(s)
Analysis Transformation(Selection)
Summary
FeatureCombiner
F1+F2+F3
FeatureExtractor
Synthesis (Smoothing)
Sentence Selector
Sentence Revisor
FeatureExtractor
FeatureExtractor
RANLP’2003Page 27
Copyright © 2003 Inderjeet Mani. All rights reserved.
Revision as Repair
structured environments (tables, etc.)
- recognize and exclude
- **recognize and summarize
anaphors
- exclude sentences (which begin) with anaphors
- include a window of previous sentences
- **reference adjustment
gaps
- include low-ranked sentences immediately between two selected sentences
- add first sentence of para if second or third selected
- **model rhetorical structure of source
RANLP’2003Page 28
Copyright © 2003 Inderjeet Mani. All rights reserved.
A Simple Text Revision Algorithm
Construct initial “sentence-extraction” draft from source by picking highest weighted sentences in source until compression target is reached
Revise draft
- Use syntactic trees (using a statistical parser) augmented with coreference classes
1 Procedure Revise(draft, non-draft, rules, target-compression):
2 for each rule in rules
3 while ((compression(draft)- target-compression) < )
4 while (<x, y> := next-candidates(draft, non-draft)) # e.g., binary rule
5 result := apply-rule(rule, x, y); # returns first result which succeeds
6 draft := draft U result
RANLP’2003Page 29
Copyright © 2003 Inderjeet Mani. All rights reserved.
Example of Sentence Revision
DeletedSalient
Aggregated
RANLP’2003Page 30
Copyright © 2003 Inderjeet Mani. All rights reserved.
Informativeness vs. Coherence in Sentence Revision
11
11.5
12
12.5
13
13.5
14
14.5
15
15.5
IAA+E
0.3050.31
0.3150.32
0.3250.33
0.3350.34
0.3450.35
0.355
0.345
IEAA+E
> is good
A > I, A+E > I (initial draft)
A >* E, A+E >* E
> is good
A > I, A+E > I (initial draft)
A >* E, A+E >* E
< is good
A+E <* I
A >* I
< is good
A+E <* I
A >* I
Mani, Gates, and Bloedorn (ACL’99): 630 summaries from 7 systems (of 90 documents) were
revised and evaluated using vocabulary overlap measure against TIPSTER answer keys. A: Aggregation, E: Elimination
Informativeness Sentence Complexity
RANLP’2003Page 31
Copyright © 2003 Inderjeet Mani. All rights reserved.
CORPUS-BASEDSENTENCE EXTRACTION
RANLP’2003Page 32
Copyright © 2003 Inderjeet Mani. All rights reserved.
The Need for Corpus-Based Sentence Extraction
Importance of particular features can vary with the genre of text
- e.g., location features: newspaper stories: leading text scientific text: conclusion TV news: previews
So, there is a need for summarization techniques that are adaptive, that can be trained for different genres of text
RANLP’2003Page 33
Copyright © 2003 Inderjeet Mani. All rights reserved.
Learning Sentence Extraction Rules
Few corpora available; labeling can be non-trivial, requiring aligning each document unit (e.g., sentence) with abstract.
Learns to extract just individual sentences (though feature vectors can include contextual features).
Few corpora available; labeling can be non-trivial, requiring aligning each document unit (e.g., sentence) with abstract.
Learns to extract just individual sentences (though feature vectors can include contextual features).
RANLP’2003Page 34
Copyright © 2003 Inderjeet Mani. All rights reserved.
Example1: Kupiec et al. (1995) Input
- Uses a corpus of 188 full-text/abstract pairs drawn from 21 different scientific collections
- Professionally written abstracts 3 sentences long on the average
- The algorithm takes each sentence and computes a probability that it should be included in a summary, based on how similar it is to the abstract
Uses Bayesian classifier
Result
- About 87% (498) of all abstract sentences (568) could be matched to sentences in the source (79% direct matches, 3% direct joins, 5% incomplete joins)
- Location was best feature at 163/498 = 33%
- Para+fixed-phrase+sentence length cutoff gave best sentence recall performance … 217/498=44%
- At compression rate = 25% (20 sentences), performance peaked at 84% sentence recall
RANLP’2003Page 35
Copyright © 2003 Inderjeet Mani. All rights reserved.
Example 2: Mani & Bloedorn (1998)
cmp-lg corpus (xxx.lanl.gov/cmp-lg) of scientific texts, prepared in SGML form by Simone Teufel at U. Edinburgh
198 pairs of full-text sources and author-supplied abstracts Full-text sources vary in size from 4 to 10 pages, dating from
1994-6 SGML tags include: paragraph, title, category, summary,
headings and heading depth (figures, captions and tables have been removed)
Abstract length averages about 5% (avg. 4.7 sentences) of source length
Processing- Each sentence in full-text source converted to feature
vector- 27,803 feature-vectors (reduces to 903 unique vectors)- Generated generic and user focused summaries
RANLP’2003Page 36
Copyright © 2003 Inderjeet Mani. All rights reserved.
Comparison of Learning Algorithms
20% compression, 10 fold cvMethod Pred.
AccuracyPrecision Recall F-score
Naïve Bayes –discretized .69 .70 .65 .67
C4.5 Rules (pruned) .69 .62 .70 .66
AQ .56 .54 .76 .63
SCDF .64 .66 .58 .62
Instance-Based, k=3 .61 .59 .60 .59
Naïve-Bayes-discretized .90 .90 .90 .90
C4.5 Rules (pruned) .89 .88 .91 .89
SCDF .88 .88 .89 .88
Instance-Based, k=3 .82 .80 .85 .82
AQ .76 .70 .92 .80
Gen
eri
cU
ser-
focu
sed
RANLP’2003Page 37
Copyright © 2003 Inderjeet Mani. All rights reserved.
Example Rules
Generic summary rule, generated by C4.5Rules (20% compression)
If sentence is in the conclusion and it is a high tf.idf sentence
Then it is a summary sentence
User-focused rules, generated by AQ (20% compression)
If the sentence includes 15..20 keywords* present
Then it is a summary sentence (163 total, 130 unique)
If the sentence is in the middle third of the paragraph and the paragraph is in the first third of the section
Then it is a summary sentence (110 total, 27 unique)
*keywords - terms occurring in sentences ranked as highly-relevant to query (abstract)
RANLP’2003Page 38
Copyright © 2003 Inderjeet Mani. All rights reserved.
Issues in Learning Sentence Extraction Rules
Choice of corpus
- size of corpus
- availability of abstracts/extracts/judgments
- quality of abstracts/extracts/judgments compression, representativeness, coherence, language, etc.
Choice of labeler to label a sentence as summary-worthy or not based on a comparison between the source document sentence and the document's summary.
- Label a source sentence (number) as summary worthy if it found in the extract
- Compare summary sentence content with source sentence content (labeling by content similarity – L/CS)
- Create an extract from an abstract (e.g., by alignment L/A->E )
Feature Representation, Learning Algorithm, Scoring
RANLP’2003Page 39
Copyright © 2003 Inderjeet Mani. All rights reserved.
L/CS in KPC
To determine if sE, they use a content-based match (since the summaries don’t always lift sentences from the full-text).
They match the source sentence to each sentence in the abstract. Two varieties of matches:
- Direct sentence match: the summary sentence and source text sentence are
identical or can be considered to have the same content. (79% of matches)
- Direct join: two or more sentences from the source text (called
joins) appear to have the same content as a single summary sentence. (3% of matched)
RANLP’2003Page 40
Copyright © 2003 Inderjeet Mani. All rights reserved.
L/CS in MB98: Generic Summaries
For each source text
- Represent abstract (list of sentences)
- Match source text sentences against abstract, giving a ranking for source sentences (ie, abstract as “query”) combined-match: compare source sentence against entire
abstract (similarity based on content-word overlap + weight) individual-match: compare source sentence against each
sentence of abstract (similarity based on longest string match to any abstract sentence)
- Label top C% of the matched source sentences’ vectors as positive C (Compression) = 5,10,15,20,25
- e.g., C=10 => for a 100-sentence source text, 10 sentences will be labeled positive
RANLP’2003Page 41
Copyright © 2003 Inderjeet Mani. All rights reserved.
L/A->E in Jing et al. 98
f1
f2
Find the fr which maximizes P(fr(w1…wn))i.e., using Markov Assumption
P(fr(w1….wn)) i=1,n P(fr(wi)|fr(wi-1))
w1 w2
Abstract Source
RANLP’2003Page 42
Copyright © 2003 Inderjeet Mani. All rights reserved.
Sentence Extraction as Bayesian Classification
P(s| F1,…, Fn) = j=1,nP(Fj|sE) P(sE) / j=1,nP(Fj)
P(sE) - compression rate cP(s| F1,…, Fn) - probability that
sentence s is included in extract E, given the sentence’s feature-value pairs
P(Fj) - probability of feature-value pair occurring in a source sentence
P(Fj|sE) - probability of feature -value pair occurring in a source sentence which is also in the extract
The features are discretized into Boolean features, to simplify matters
RANLP’2003Page 43
Copyright © 2003 Inderjeet Mani. All rights reserved.
ADDING DISCOURSE-LEVEL FEATURES
TO THE MIX
RANLP’2003Page 44
Copyright © 2003 Inderjeet Mani. All rights reserved.
Cohesion
There are links in text, called ties, which express semantic relationships
Two classes of relationships:- Grammatical cohesion
anaphora ellipsis conjunction
- Lexical cohesion synonymy hypernymy repetition
RANLP’2003Page 45
Copyright © 2003 Inderjeet Mani. All rights reserved.
Martian Weather with Grammatical and Lexical Cohesion Relations
With its distant orbit 50 percent farther from the sun than Earth and slim atmospheric blanket, Mars experiences frigid weather conditions. Surface temperatures typically average about 60 degrees Celsius ( 76 degrees Fahrenheit) at the equator and […] can dip to 123 degrees C near the poles. Only the midday sun at tropical latitudes is warm enough to thaw ice on occasion, but any liquid water formed in this way would evaporate almost instantly because of the low atmospheric pressure. Although the atmosphere holds a small amount of water, and water ice clouds sometimes develop, most Martian weather involves blowing dust or carbon dioxide. Each winter, for example, a blizzard of frozen carbon dioxide rages over one pole, and a few meters of this dry ice snow accumulate as previously frozen carbon dioxide evaporates from the opposite polar cap. Yet even on the summer pole, where the sun remains in the sky all day long, temperatures never warm enough to melt frozen water.
RANLP’2003Page 46
Copyright © 2003 Inderjeet Mani. All rights reserved.
Text Graphs based on Cohesion
Represent a text as a graph Nodes: words (or sentences) Links: Cohesion links between nodes Graph Connectivity Assumption:
- More highly connected nodes are likely to carry salient information.
RANLP’2003Page 47
Copyright © 2003 Inderjeet Mani. All rights reserved.
Cohesion based Graphs
Skorochodhko 1972 Salton et al. 1994 Mani & Bloedorn 1997
Node: SentenceLink: RelatedP
Method: node centrality and topology
Node: ParagraphLink: Cosine Similarity
Method: Local segmentation then node centrality
Node: Words/PhrasesLink: Lexical/Grammatical Cohesion
Method: node centrality discovered by spreading activation (see also clustering using lexical chains)
chain
ring
monolith
piecewise
P5
1 2 3
P10
P5
P13
P16
P3
P8P7
P9
P12
P15
P18P19 P21
P23
P24
Link between nodes > 5 apart ignoredBest 30p links at density 2.00, seg_csim 0.26
Facts about an issue
Legality of an issue
RANLP’2003Page 48
Copyright © 2003 Inderjeet Mani. All rights reserved.
Coherence Coherence is the modeling of discourse relations using
different sources of evidence, e.g., - Document format
layout in terms of sections, chapters, etc. page layout
- Topic structure TextTiling (Hearst)
- Rhetorical structure RST (Mann & Mathiessen) Text Grammars (vanDijk, Longacre) Genre-specific rhetorical structures (Methodology,
Results, Evaluation, etc.) (Liddy , Swales, Teufel & Moens, Saggion & Lapalme, etc.)
- Narrative structure
RANLP’2003Page 49
Copyright © 2003 Inderjeet Mani. All rights reserved.
Using a Coherence-based Discourse Model in Summarization
Choose a theory of discourse structure Parse text into a labeled tree of discourse segments, whose
leaves are sentences or clauses- Leaves typically need not have associated semantics
Weight nodes in tree, based on node promotion and clause prominence
Select leaves based on weight Print out selected leaves for summary synthesis
RANLP’2003Page 50
Copyright © 2003 Inderjeet Mani. All rights reserved.
Martian Weather Summarized Using Marcu’s Algorithm (target length = 4 sentences) [With its distant orbit {– 50 percent farther from the sun than Earth –} and
slim atmospheric blanket,1] [Mars experiences frigid weather conditions.2] [Surface temperatures typically average about –60 degrees Celsius (–76 degrees Fahrenheit) at the equator and can dip to –123 degrees C near the poles.3] [Only the midday sun at tropical latitudes is warm enough to thaw ice on occasion,4] [but any liquid water formed that way would evaporate almost instantly5] [because of the low atmospheric pressure.6] [Although the atmosphere holds a small amount of water, and water-ice clouds sometimes develop,7] [most Martian weather involves blowing dust or carbon dioxide.8] [Each winter, for example, a blizzard of frozen carbon dioxide rages over one pole, and a few meters of this dry-ice snow accumulate as previously frozen carbon dioxide evaporates from the opposite polar cap.9] [Yet even on the summer pole, {where the sun remains in the sky all day long,} temperatures never warm enough to melt frozen water.10]
2 > 8 > {3, 10} > {1, 4, 5, 7, 9}
RANLP’2003Page 51
Copyright © 2003 Inderjeet Mani. All rights reserved.
Illustration of Node Promotion (Marcu)
Nodes: RelationsLeaves: Clauses
Nucleus: square boxesSatellite: dotted boxes
RANLP’2003Page 52
Copyright © 2003 Inderjeet Mani. All rights reserved.
Detailed Evaluation of Marcu’s Method Recall Precision Size of Expt.
Clause Segmentation 81.3 90.3 3 texts, 3 judgesDiscourse Marker ID 80.8 89.5 3 texts, 3 judgesSalience Weighting 65.0 67.0 5 texts, 3 judges(Machine-Generated Trees)Salience Weighting 67.0 78.0 5 texts, 3 judges(Human-Generated Trees) Issues
- How well can humans construct trees? Discourse Segmentation .77 Kappa (30 news, 3
coders) Relations .61 Kappa ditto
- How well can machines construct trees? Machine trees show poor correlation with human trees, but
shape and nucleus/satellite assignment very similar
RANLP’2003Page 53
Copyright © 2003 Inderjeet Mani. All rights reserved.
AGENDA
14:10 pm I. Fundamentals (Definitions,
Human Abstracting, Abstract Architecture)
14:40 II. Extraction (Shallow Features, Revision,
Corpus-Based Methods)
15:30 Break
16: 00 III. Abstraction (Template and Concept-Based)
16:30 IV. Evaluation
17:00 pm V. Research Areas
Multi-document, Multimedia, Multilingual
Summarization
17:30 pm Conclusion
RANLP’2003Page 54
Copyright © 2003 Inderjeet Mani. All rights reserved.
Abstracts Require Deep Methods
An abstract is a summary at least some of whose material is not present in the input.
Abstracts involve inferences made about the content of the text; they can reference background concepts, i.e., those not mentioned explicitly in the text.
Abstracts can result in summarization at a much higher degree of compression than extracts
Human abstractors make inferences in producing abstracts, but are instructed “not to invent anything”
So, “degree of abstraction” knob important. Could control extent of generalization, degree of lexical substitution, aggregation, etc.
So, “degree of abstraction” knob important. Could control extent of generalization, degree of lexical substitution, aggregation, etc.
RANLP’2003Page 55
Copyright © 2003 Inderjeet Mani. All rights reserved.
Template Extraction
Wall Street Journal, 06/15/88
MAXICARE HEALTH PLANS INC and UNIVERSAL HEALTH SERVICES INC have dissolved a joint venture which provided health services.
Synthesis
Analysis
Templates<TEMPLATE-8806150049-1> := DOC NR: 8806150049 CONTENT: <TIE_UP_RELATIONSHIP-8806150049-1> DATE TEMPLATE COMPLETED: 311292 EXTRACTION TIME: 0
Source
Transformation
RANLP’2003Page 56
Copyright © 2003 Inderjeet Mani. All rights reserved.
Template Example (Paice and Jones 1983)Concept Definition
SPECIES the crop species concerned
CULTIVAR the varieties used
HIGH-LEVEL PROPERTY the property being investigated, e.g., yield, growth rate
PEST any pest which infests the crop
AGENT chemical or biological agent applied
INFLUENCE e.g., drought, cold, grazing, cultivation system
LOCALITY where the study was performed
TIME years when the study was conducted
SOIL description of soil
Canned Text Patterns
“This paper studies the effect the pest PEST has on the PROPERTY of SPECIES.”
“An experiment in TIME at LOCALITY was undertaken.”
Output: This paper studies the effect the pest G. pallida has on the yield of potato.
An experiment in 1985 and 1986 at York, Lincoln and Peterbourgh, England
was undertaken.
RANLP’2003Page 57
Copyright © 2003 Inderjeet Mani. All rights reserved.
Templates Can get Complex! (MUC-5)<TEMPLATE-8806150049-1> := DOC NR: 8806150049 CONTENT: <TIE_UP_RELATIONSHIP-8806150049-1> DATE TEMPLATE COMPLETED: 311292 EXTRACTION TIME: 0<TIE_UP_RELATIONSHIP-8806150049-1> := TIE-UP STATUS: DISSOLVED ENTITY: <ENTITY-8806150049-1> <ENTITY-8806150049-2> JOINT VENTURE CO: <ENTITY-8806150049-3> OWNERSHIP: <OWNERSHIP-8806150049-1> <OWNERSHIP-8806150049-2> ACTIVITY: <ACTIVITY-8806150049-1><ENTITY-8806150049-1> := NAME: Maxicare Health Plans INC ALIASES: "Maxicare" LOCATION: Los Angeles (CITY 4) California (PROVINCE 1) United States (COUNTRY) TYPE: COMPANY ENTITY RELATIONSHIP: <ENTITY_RELATIONSHIP-8806150049-1><ENTITY-8806150049-2> := NAME: Universal Health Services INC ALIASES: "Universal Health" LOCATION: King of Prussia (CITY) Pennsylvania (PROVINCE 1) United States (COUNTRY) TYPE: COMPANY ENTITY RELATIONSHIP: <ENTITY_RELATIONSHIP-8806150049-1><ACTIVITY-8806150049-1> := INDUSTRY: <INDUSTRY-8806150049-1> ACTIVITY-SITE: (<FACILITY-8806150049-1> <ENTITY-8806150049-3>)<INDUSTRY-8806150049-1> := INDUSTRY-TYPE: SERVICE PRODUCT/SERVICE: (80 "a joint venture Nevada health maintenance [organization]")
RANLP’2003Page 58
Copyright © 2003 Inderjeet Mani. All rights reserved.
Assessment of Template Method
Characteristics:- Templates can be simple or complex, and there may be multiple
templates (e.g., multi-incident document)
- Templates (and sets of them) benefit from aggregation and elimination operations to pinpoint key summary information
- Salience is pre-determined based on slots, or computed (e.g., event frequencies)
Advantages:
- Provides a useful capability for abstracting semantic content
- Steady progress in information extraction, based on machine learning from large corpora
Limitations:
- Requires customization for specific types of input data
- Only summarizes that type of input data
RANLP’2003Page 59
Copyright © 2003 Inderjeet Mani. All rights reserved.
Concept Abstraction Method
Captures the content of a document in terms of abstract categories
Abstract categories can be - sets of terms from the document- topics from labeled collections or background knowledge
(e.g., a thesaurus or knowledge base) To leverage background knowledge
- Obtain an appropriate concept hierarchy- Mark concepts in hierarchy with their frequency of
reference in the text requires word-sense disambiguation
- Find the most specific generalizations of concepts referenced in the text
- Use the generalizations in an abstract
RANLP’2003Page 60
Copyright © 2003 Inderjeet Mani. All rights reserved.
Concept Abstraction Example
|| )()|(|)|(|
CActInstsCActInstsCInsts
Salient (C) iff
)(1
)1()()(CchildC
CWCfreqCW
Counting Concept and Instance Links(Hahn & Reimer ‘99)
Counting Concept and Subclass Links(Lin & Hovy ‘99)
Most Specific Generalization: Traverse downwards untilyou find C whose children contributeequally to its weight Sun
Workstation
The department is buying a Sun Workstation, a HP 3690, and a Toshiba machine. The IBM ThinkPad will not be bought from next year onwards.
IBM ThinkPad
RANLP’2003Page 61
Copyright © 2003 Inderjeet Mani. All rights reserved.
Assessment of Concept Abstraction
Allows for Generalization based on links (instance, subclass, part-of, etc.)
Some efforts at controlling extent of generalization Hierarchy needs to be available, and contain domain (senses
of) words- Generic hierarchies may contain other senses of word- Constructing a hierarchy by hand for each domain is
prohibitively expensive Result of generalization needs to be readable by human (e.g.,
generation, visualization)- So, useful mainly in transformation phase
RANLP’2003Page 62
Copyright © 2003 Inderjeet Mani. All rights reserved.
Generation (Statistical) of Headlines
Shows how statistical methods can be use to generate abstracts (Banko et al. 2000)
))))((log(
))|(log(
))|(log(
(maxarg
*
21
1
nHlenP
wwP
DwHwP
s
n
iii
n
iii
H
Select doc words that occur frequently in example headlines
Select doc words that occur frequently in example headlines
Order words based on pair co-occurrences
Order words based on pair co-occurrences
Length of headlineLength of headline
H=headline, D=docH=headline, D=doc
RANLP’2003Page 63
Copyright © 2003 Inderjeet Mani. All rights reserved.
AGENDA
14:10 pm I. Fundamentals (Definitions,
Human Abstracting, Abstract Architecture)
14:40 II. Extraction (Shallow Features, Revision,
Corpus-Based Methods)
15:30 Break
16: 00 III. Abstraction (Template and Concept-Based)
16:30 IV. Evaluation
17:00 pm V. Research Areas
Multi-document, Multimedia, Multilingual
Summarization
17:30 pm Conclusion
RANLP’2003Page 64
Copyright © 2003 Inderjeet Mani. All rights reserved.
Summarization Evaluation: Intrinsic and Extrinsic Methods
Intrinsic methods test the system in itself- Criteria
Coherence Informativeness
- Methods Comparison against reference output Comparison against summary input
Extrinsic methods test the system in relation to some other task
- time to perform tasks, accuracy of tasks, ease of use- expert assessment of usefulness in task
RANLP’2003Page 65
Copyright © 2003 Inderjeet Mani. All rights reserved.
Coherence: How does a summary read?
Humans can judge this by subjective grading (e.g., 1-3 scale) on specific criteria
- General readability criteria: spelling, grammar, clarity, impersonal style, conciseness, readability and understandability, acronym expansion, etc. (Saggion and LaPalme 2000)
- Criteria can also be specific to extracts (dangling anaphors, gaps,etc.) or abstracts (ill-formed sentences, inappropriate terms, etc.)
When subjects assess summaries for coherence, the scores can be compared against scores for reference summaries, scores for source docs, or against scores for other summarization systems
Automatic scoring has a limited role to play here
RANLP’2003Page 66
Copyright © 2003 Inderjeet Mani. All rights reserved.
Informativeness: Is the content preserved?
Measure the extent to which summary preserves information from a source or a reference summary
Humans can judge this by subjective grading (e.g., 1-3 scale) on specific criteria
When subjects assess summaries for informativeness, the scores can be compared against scores for reference summaries, scores for source docs, or against scores for other summarization systems
SourceDocument
HumanSummary
(Reference)
MachineSummary
MachineSummary
CompareComparison method
can be manual or automatic
Comparison method
can be manual or automatic
RANLP’2003Page 67
Copyright © 2003 Inderjeet Mani. All rights reserved.
Human Agreement in Reference Extracts
Previous studies, most of which have focused on extracts, have shown evidence of low agreement among humans
Source #docs #subjects % agreement CiteScientific American 10 6 8% Rath et al. 61Funk and Wagnall's 50 2 46% Mitra et al. 97
However, there is also evidence that judges may agree more on the most important sentences to include (Jing et al. 99), (Marcu 99)
When subjects disagree, system can be compared against majority opinion, most similar human summary (‘optimistic’) or least similar human summary (‘pessimistic’) (Mitra et al. 97)
RANLP’2003Page 68
Copyright © 2003 Inderjeet Mani. All rights reserved.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.0 0.1 0.2 0.3 0.4 0.5
Compression
Ave
rag
e A
nsw
er
Re
call
(A
RA
)
CGI/CMU
Cornell/SabIR
GE
ISI
NMSU
Penn
SRA
TextWise
Modsumm
1 3 2
1
2
3
1,2
23
1
1
2
3
13
1
2
31
2
3
1
2
3
2 3
Intrinsic Evaluation: SUMMAC Q&A Results
Highest recall associated with the least reduction of the source
Highest recall associated with the least reduction of the source
informativeness ratio of accuracy to compression of about 1.5.
Content-based automatic scoring (vocabulary overlap) correlates very well with human scoring (passage/answer recall)
Content-based automatic scoring (vocabulary overlap) correlates very well with human scoring (passage/answer recall)
RANLP’2003Page 69
Copyright © 2003 Inderjeet Mani. All rights reserved.
Intrinsic Evaluation: Japanese Text Summarization Challenge (2000)
At each compression, systems outperformed Lead and TF baselines in content overlap with human summaries
Subjective grading of coherence and informativeness showed that human abstracts > human extracts > systems and baselines
Against ExtractsAgainst Extracts
Against AbstractsAgainst Abstracts
Subjective GradingSubjective Grading
(Fukusima and Okumura 2001)
RANLP’2003Page 70
Copyright © 2003 Inderjeet Mani. All rights reserved.
DUC’2001 Summarization Evaluation http://www-nlpir.nist.gov/projects/duc/
Intrinsic evaluation of single and multiple doc English summaries by comparison against referenced summaries
60 reference sets: 30 training, 30 test, each with an average of 10 documents
a single 100-word summaries for each document (sds)
four multi-document summaries (400, 200, 100, and 50-word) for each set (mds)
www.isi.edu/~cyl/SEE
RANLP’2003Page 71
Copyright © 2003 Inderjeet Mani. All rights reserved.
DUC’2001 Setup
doc sets are on
- A single event with causes and consequences
- Multiple distinct events of a single type (e.g., solar eclipses)
- Subject (discuss a single subject)
- One of the above in the domain of natural disasters (e.g., Hurricane Andrew)
- Biographical (discuss a single person))
- Opinion (different opinions about the same subject, e.g., welfare reform)
400-word mds used to build 50, 100, and 200-word mds
Baselines
- sds - first 100 words
- mds
1st 50, 100, 200, 400 in most recent
1st sentence in 1st, 2nd, ..nth doc, 2nd sentence, …until 50/100/200/400
RANLP’2003Page 72
Copyright © 2003 Inderjeet Mani. All rights reserved.
Eval Criteria
Informativeness (Completeness)
- Recall of reference summary units
Coherence (1-5 scales)
- Grammar: “Do the sentences, clauses, phrases, etc. follow the basic rules of English?
Don’t worry here about style or the ideas.
Concentrate on grammar.”
- Cohesion: “Do the sentences fit in as they should with the surrounding sentences?
Don’t worry about the overall structure of the ideas.
Concentrate on whether each sentence naturally follows the preceding one and leads into the next.”
- Organization: “Is the content expressed and arranged in an effective manner?
Concentrate here on the high-level arrangement of the ideas.”
RANLP’2003Page 73
Copyright © 2003 Inderjeet Mani. All rights reserved.
Assessment Phase 1: assessor judged system summary against her own
reference summary Phase 2: assessor judged system summary against 2 others’
reference summaries System summaries divided into automatically determined sentences
(called PUs) Reference summaries divided into automatically determined EDU’s
(called MUs), which were then lightly edited by humans
RANLP’2003Page 74
Copyright © 2003 Inderjeet Mani. All rights reserved.
Results: Coherence
Grammar- Baseline < System <
Humans (3.23, 3.53. 3.79 means)
- Most baselines contained a sentence fragment
Cohesion- Baseline=system=humans
=3 (sds medians)- Baseline=2=system<huma
ns=3 (mds medians) Organization
- Baseline=3=system<humans=4 (sds)
- Baseline=2=system<humans=3(mds)
Grammar (esp. ‘All’) too sensitive to low-level formatting
Cohesion/Organization- Cohesion and
Organization didn’t make sense for very short summaries
- Cohesion hard to distinguish from Organization
Overall, except for grammar, system summaries no better than baselines
RANLP’2003Page 75
Copyright © 2003 Inderjeet Mani. All rights reserved.
Informativeness (Completeness) Measure
For each MU:
“The marked PUs, taken together, express [ All, Most, Some, Hardly any, or None ]of the meaning expressed by the MU”
RANLP’2003Page 76
Copyright © 2003 Inderjeet Mani. All rights reserved.
Results: Informativeness
Average Coverage: Average of the per-MU completeness judgments [0..4] for a peer summary
Baselines =.5 <= systems =.6 < humans=1.3 (overall medians) lots of outliers relatively lower baseline and system performance on mds small improvements in mds as size increases
Even for simple sentences/EDU’s, determination of shared meaning was very hard!
RANLP’2003Page 77
Copyright © 2003 Inderjeet Mani. All rights reserved.
Short multi-docsummary
DUC’2003 (NIST slide)TDTdocs
TRECdocs
Novelty docs
Very short single-doc summaries Short
multi-docsummary
Short multi-docsummary
TREC Novelty topic
Relevant/novelsentences
Very short single-doc summaries
+
TDT topic+
Viewpoint
Task 2
Task 3
Task 4
Task 1
+
30 clusters
30 clusters
30 clusters
10 words
100 words
100 words
100 words
RANLP’2003Page 78
Copyright © 2003 Inderjeet Mani. All rights reserved.
DUC’2003 Metrics & Results
Coherence: Quality (Tasks 2-4):
- Systems < Baseline <= Manual
Informativeness:
- Coverage (Tasks 1-4) =avg(per-MU completeness judgments for a peer summary) * target length / actual length
Systems < Manual; most systems indistinguishable
- ‘Usefulness’ (Task 1) Grade each summary according to how useful you think it would be in getting you to choose the document
Manual summaries distinct from systems; tracks coverage closely
- ‘Responsiveness’ (Task 4) Read the topic/question and all the summaries. Consult the relevant sentences as needed. Grade each summary according to how responsive it is in form and content to the question.
Manual summaries distinct from systems/baselines; tracks coverage generally
RANLP’2003Page 79
Copyright © 2003 Inderjeet Mani. All rights reserved.
Baseline summaries etc. (NIST slide)
NIST (Nega Alemayehu) created baseline summaries
- Baselines 2-5: automatic
- based roughly on algorithms suggested by Daniel Marcu
- no truncation of sentences, so some baseline summaries went over the limit (+ <=15 words) and some were shorter than required)
Original author’s headline 1 (task 1)
- Use the document’s own “headline” element
Baseline 2 (tasks 2, 3)
- Take the 1st 100 words in the most recent document.
Baseline 3 (tasks 2, 3)
- Take the 1st sentence in the 1st, 2nd, 3rd,… document in chronological sequence until you have 100 words.
Baseline 4 (task 4)
- Take the 1st 100 words from the 1st n relevant sentences in the 1st document in the set. ( Documents ordered by relevance ranking given with the topic.)
Baseline 5 (task 4)
- Take the 1st relevant sentence from the 1st, 2nd, 3rd,… document until you have 100 words. (Documents ordered by relevance ranking given with the topic.)
RANLP’2003Page 80
Copyright © 2003 Inderjeet Mani. All rights reserved.
Extrinsic Methods: Usefulness of Summary in Task If the summary involves instructions of some kind, it is possible to
measure the efficiency in executing the instructions.
measure the summary's usefulness with respect to some information need or goal, such as
- finding documents relevant to one's need from a large collection, routing documents
- extracting facts from sources
- producing an effective report or presentation using a summary
- etc.
assess the impact of a summarizer on the system in which it is embedded, e.g., how much does summarization help the question answering system?
measure the amount of effort required to post-edit the summary output to bring it to some acceptable, task-dependent state
…. (unlimited number of tasks to which summarization could be applied)
RANLP’2003Page 81
Copyright © 2003 Inderjeet Mani. All rights reserved.
SUMMAC Time and Accuracy (adhoc task, 21 subjects)
Conclusion - Adhoc
S2’s save time by 50% without impairing accuracy!
Conclusion - Adhoc
S2’s save time by 50% without impairing accuracy!
S2’s (23% of source on avg.) roughly halved decision time rel. to F (full-text)!
S2’s (23% of source on avg.) roughly halved decision time rel. to F (full-text)!
All F-score and Recall differences are significant except between F& S2
All F-score and Recall differences are significant except between F& S2
All time differences are significant except between B & S1
All time differences are significant except between B & S1
RANLP’2003Page 82
Copyright © 2003 Inderjeet Mani. All rights reserved.
AGENDA
14:10 pm I. Fundamentals (Definitions,
Human Abstracting, Abstract Architecture)
14:40 II. Extraction (Shallow Features, Revision,
Corpus-Based Methods)
15:30 Break
16: 00 III. Abstraction (Template and Concept-Based)
16:30 IV. Evaluation
17:00 pm V. Research Areas
Multi-document, Multimedia, Multilingual
Summarization
17:30 pm Conclusion
RANLP’2003Page 83
Copyright © 2003 Inderjeet Mani. All rights reserved.
Multi-Document Summarization
Extension of single-document summarization to collections of related documents
- but naïve “concatenate each summary” extension is faced with repetition of information across documents
Requires fusion of information across documents- Elimination, aggregation, and generalization operations carried
out on collection instead of individual documents
Collections can vary considerably in size- different methods for different ranges (e.g, cluster first if > n)
Higher compression rates usually needed- perhaps where abstraction is really critical
NL Generation and Visualization have an obvious role to play here
RANLP’2003Page 84
Copyright © 2003 Inderjeet Mani. All rights reserved.
Example MDS Problems
Timothy James McVeigh, 27, was formally charged on Fri day with the bombing of a federal building in OklahomaCity which killed at least 65 people, the Justice Depart ment said.
Timothy James McVeigh, 27, was formally charged on Fri day with the bombing of a federal building in OklahomaCity which killed at least 65 people, the Justice Depart ment said.
The first suspect, Gulf War veteran Timothy McVeigh, 27,was charged with the bombing Friday after being arrestedfor a traffic violation shortly after Wednesday's blast.
Federal agents have arrested suspect in the OklahomaCity bombing Timothy James McVeigh, 27. McVeigh wasformally charged on Friday with the bombing.
Timothy McVeigh, the man charged in the Oklahoma Citybombing, had correspondence in his car vowing revengefor the 1993 federal raid on the Branch Davidian com pound in Waco, Texas, the Dallas Morning News saidMonday.
Eighteen decapitated bodies have been found in a
mass grave in northern Algeria, press reports
said Thursday.
Algerian newspapers have reported on Thursday
that 18 decapitated bodies have been found by
the authorities.
RANLP’2003Page 85
Copyright © 2003 Inderjeet Mani. All rights reserved.
Multi-Document Summarization Methods
Shallow Approaches- passage extraction and comparison
removes redundancy by vocabulary overlap comparisons
Deep Approaches- template extraction and comparison
removes redundancy by aggregation and generalization operators
- syntactic and semantic passage comparison
RANLP’2003Page 86
Copyright © 2003 Inderjeet Mani. All rights reserved.
Passage Extraction and Summarization
Maximal Marginal Relevance Example: 100 hits - 1st 20 same event, but 36, 41, 68 very different,
although marginally less relevant
As a post-retrieval filter to retrieval of relevance-ranked hits, offers a reranking parameter which allows you to slide between relevance to query and diversity from hits you have seen so far.
MMR(Q, R, S) = ArgmaxDi in R\S[sim1(Di, Q) - (1-) maxDj in R sim2(Di, Dj)]
where Q is the query, R is the retrieved set, S is the scanned subset of R
Example:
R={D1, D2, D3}; S= {D1}; =0
Dj=D2=>-(1- )sim2(D2,D1) = -.4
Dj=D3=>-(1- )sim2(D2,D1) = -.2, so pick D3
Cohesion-Based Approaches Across Documents
- Salton’s Text Maps
- User-Focused Passage Alignment
QD1
D2
D3
RANLP’2003Page 87
Copyright © 2003 Inderjeet Mani. All rights reserved.
User-Focused Passage Alignment
RANLP’2003Page 88
Copyright © 2003 Inderjeet Mani. All rights reserved.
Template Comparison Method (McKeown and Radev 1995)
Contradiction operator: applies to template pairs which have same incident location but which originate from different sources (provided at least one other slot differs in value)
- If value of number of victims is lowered across two reports from the same source, this suggests the old information is incorrect; if it goes up, the first report had incomplete information
The afternoon of Feb 26, 1993, Reuters reported that a suspected bomb killed at least
five people in the World Trade Center. However, Associated Press announced that
exactly five people were killed in the blast.
Refinement operator: applies to template pairs where the second’s slot value is a specialization of the first’s for a particular slot (e.g., terrorist group identified by country in first template, and by name in later template)
Other operators: perspective change, agreement, addition, superset, trend, etc.
RANLP’2003Page 89
Copyright © 2003 Inderjeet Mani. All rights reserved.
Syntactic Passage Comparison (MultiGen)
Timothy James McVeigh, 27, was formally charged on Fri day with the bombing of a federal building in OklahomaCity which killed at least 65 people, the Justice Depart ment said.
Timothy James McVeigh, 27, was formally charged on Fri day with the bombing of a federal building in OklahomaCity which killed at least 65 people, the Justice Depart ment said.
The first suspect, Gulf War veteran Timothy McVeigh, 27,was charged with the bombing Friday after being arrestedfor a traffic violation shortly after Wednesday's blast.
Federal agents have arrested suspect in the OklahomaCity bombing Timothy James McVeigh, 27. McVeigh wasformally charged on Friday with the bombing.
Timothy McVeigh, the man charged in the Oklahoma Citybombing, had correspondence in his car vowing revengefor the 1993 federal raid on the Branch Davidian com pound in Waco, Texas, the Dallas Morning News saidMonday.
Example Theme for SyntacticComparison
Assumes very tight clustering of documents.
Similar to revision-based methods
Assumes very tight clustering of documents.
Similar to revision-based methods
RANLP’2003Page 90
Copyright © 2003 Inderjeet Mani. All rights reserved.
Lexical Semantic Merging: BIOGEN
Vernon Jordan is a presidential friend and a Clinton adviser. He helped Ms. Lewinsky find a job. He testified that Ms. Monica Lewinsky said that she had conversations with the president, that she talked to the president.
Henry Hyde is a Republican chairman of House Judiciary Committee and a prosecutor in Senate impeachment trial. He will lead the Judiciary Committee's impeachment review. Hyde urged his colleagues to heed their consciences , “the voice that whispers in our ear , ‘duty, duty, duty.’”
.
• Given 1,300 news docs• 707,000 words in collection• 607 sentences which mention “Jordan” by name• 78 appositive phrases which fall (using WordNet) into 2 semantic groups: “friend”, “adviser”; • 65 sentences with “Jordan” as logical subject, filtered based on verbs which are strongly associated in a background corpus with “friend” or “adviser”, e.g., “testify”, “plead”, “greet”• 3 sentence summary
For details, see Mani et al. ACL’2001
RANLP’2003Page 91
Copyright © 2003 Inderjeet Mani. All rights reserved.
Appositive Merging Examples
Wisconsinmf Democrat senior+ Democrat
a lawyer for the defendant
an attorney
for Paula Jones+
Chairman of the Budget Committee + Budget Committee Chairman
lawyermf attorney
person
+
synonym
Senatormf Democrat
politician
leader
person
+A=B
A, B < X < Person
mf: more frequent head/modifier for name in collection
RANLP’2003Page 92
Copyright © 2003 Inderjeet Mani. All rights reserved.
Verb-subject associations for appositive head nouns
executive police politician
reprimand 16.36 shoot 17.37 clamor 16.94
conceal 17.46 raid 17.65 jockey 17.53
bank 18.27 arrest 17.96 wrangle 17.59
foresee 18.85 detain 18.04 woo 18.92
conspire 18.91 disperse 18.14 exploit 19.57
convene 19.69 interrogate 18.36 brand 19.65
plead 19.83 swoop 18.44 behave 19.72
sue 19.85 evict 18.46 dare 19.73
answer 20.02 bundle 18.50 sway 19.77
commit 20.04 manhandle 18.59 criticize 19.78
worry 20.04 search 18.60 flank 19.87
accompany 20.11 confiscate 18.63 proclaim 19.91
own 20.22 apprehend 18.71 annul 19.91
witness 20.28 round 18.78 favor 19.92
RANLP’2003Page 93
Copyright © 2003 Inderjeet Mani. All rights reserved.
MULTIMEDIA SUMMARIZATION
RANLP’2003Page 94
Copyright © 2003 Inderjeet Mani. All rights reserved.
Broadcast News Navigator Example
InternetQuery terms constructed from Nes
Hits are then summarized
InternetQuery terms constructed from Nes
Hits are then summarized
Sentence extraction from cc, plus list of NEs
Sentence extraction from cc, plus list of NEs
RANLP’2003Page 95
Copyright © 2003 Inderjeet Mani. All rights reserved.
BNN Summary: Story Skim*
RANLP’2003Page 96
Copyright © 2003 Inderjeet Mani. All rights reserved.
BNN Story Details* textsummary
topics
namedentities
RANLP’2003Page 97
Copyright © 2003 Inderjeet Mani. All rights reserved.
Identification:Precision vs. Time (with Recall Comparison)
0.7
0.75
0.8
0.85
0.9
0.95
1
0 2 4 6 8Average Time
(minutes)
Ave
rag
e P
reci
sio
n
3 Named Entities
All Named Entities
Full Details
Key Frame
Skim Story Details
Summary
Text
Topic Video
Also High Recall
Lower Recall High PrecisionA
B
C
IDEAL
Results • Less is better
(in time and precision)• Mixed media
summaries better than single media
E.g., What stories are about Sonny Bono?
RANLP’2003Page 98
Copyright © 2003 Inderjeet Mani. All rights reserved.
CMU Meeting Summarization (Zechner 2001)
S1: well um I think we should discuss this you know with her
S1: That’s true I suggest
S1: you talk to him
S1: yeah well now get this we might go to live in switzerland
S2: oh really
S1: yeah because they’ve made him a job offer there and at first thinking nah he wasn’t going to take it but now he’s like
S1: when are we meeting?
S2: you mean tomorrow?
S1: yes
S2: at 4 pm
Summarizes audio transcriptions from multi-party dialogs
Integrated with meeting browser
Detects disfluencies: filled pauses, repairs, restarts, false starts
Identifies sentence boundaries
Identifies question-answer pairs
Then does sentence ranking using MMR
When run on automatically transcribed audio, biases summary towards words the recognizer is confident of
RANLP’2003Page 99
Copyright © 2003 Inderjeet Mani. All rights reserved.
Event Visualization and Summarization:Geospatial News on Demand Env. (GeoNODE)
Automated Cross Document, Multilingual Topic Cluster Detection and Tracking
Geospatial and Temporal Display of Events extracted from Corpus
Event Frequency by Source
VCR like controls supports exploration of corpus
RANLP’2003Page 100
Copyright © 2003 Inderjeet Mani. All rights reserved.
Multilingual Summarization (ISI)
Indonesian hitsIndonesian hits
SummarySummary
Machine Translation
Machine Translation
RANLP’2003Page 101
Copyright © 2003 Inderjeet Mani. All rights reserved.
Conclusion
Automatic Summarization is alive and well! As we interact with the massive information universes of
today and tomorrow, summarization in some form is indispensable
Areas for the future- multidocument summarization- multimedia summarization- summarization for hand-held displays- temporal summarization- etc.
RANLP’2003Page 102
Copyright © 2003 Inderjeet Mani. All rights reserved.
Resources
Books
- Mani, I. and Maybury, M. (eds.) 1999. Advances in Automatic Text Summarization. MIT Press, Cambridge.
- Mani, I. 2001. Automated Text Summarization. John Benjamins, Amsterdam.
Journals
- Mani, I. And Hahn, U. Nov 2000. Summarization Tutorial. IEEE Computer.
Conferences/Workshops
- Dagstuhl Seminar, 1993 (Karen Spärck Jones, Brigitte Endres-Niggemeyer) www.ik.fh-
hannover.de/ik/projekte/Dagstuhl/Abstract
- ACL/EACL Workshop on Intelligent Scalable Text Summarization, Madrid, 1997 (Inderjeet
Mani, Mark Maybury) (www.cs.columbia.edu/~radev/ists97/program.html)
- AAAI Spring Symposium on Intelligent Text Summarization, Stanford, 1998 (Dragomir Radev,
Eduard Hovy) (www.cs.columbia.edu/~radev/aaai-sss98-its)
- ANLP/NAACL Summarization Workshop, Seattle, 2000 (Udo Hahn, Chin-Yew Lin, Inderjeet
Mani, Dragomir Radev) www.isi.edu/~cyl/was-anlp2000.html
- NAACL Summarization Workshop, Pittsburgh, 2001
RANLP’2003Page 103
Copyright © 2003 Inderjeet Mani. All rights reserved.
Web References
On-line Summarization Tutorials
- www.si.umich.edu/~radev/summarization/radev-summtutorial00.ppt
- www.isi.edu/~marcu/coling-acl98-tutorial.html
Bibliographies
- www.si.umich.edu/~radev/summarization/
- www.cs.columbia.edu/~jing/summarization.html
- www.dcs.shef.af.uk/~gael/alphalist.html
- www.csi.uottawa.ca/tanka/ts.html
Survey: “State of the Art in Human Language Technology” (cslu.cse.ogi.edu/HLTsurvey)
Government initiatives
- DUC Multi-document Summarization Evaluation (www-nlpir.nist.gov/projects/duc)
- DARPA’s Translingual Information Detection Extraction and Summarization (TIDES) Program
(tides.nist.gov, www.darpa.mil/ito/research/tides/projlist.html)
- European Intelligent Information Interfaces program (www.i3net.org)
RANLP’2003Page 104
Copyright © 2003 Inderjeet Mani. All rights reserved.
AGENDA