11/4/1999ACM Multimedia 991 Auto-Summarization of Audio-Video Presentations Li-wei He, Elizabeth...
-
Upload
katelyn-skinner -
Category
Documents
-
view
222 -
download
0
Transcript of 11/4/1999ACM Multimedia 991 Auto-Summarization of Audio-Video Presentations Li-wei He, Elizabeth...
11/4/1999 ACM Multimedia 99 1
Auto-Summarization of Audio-Video Presentations
Li-wei He, Elizabeth Sanocki
Anoop Gupta, Jonathan Grudin
Collaboration and Multimedia Group
Microsoft Research
11/4/1999 ACM Multimedia 99 2
Motivation
• On-demand multimedia is becoming pervasive– Corporate training and communication
• At Microsoft, over 360 courses online in two years
– Research seminars• Microsoft Research archives about 2 talks daily
11/4/1999 ACM Multimedia 99 3
Motivation (Cont.)
• Effective summarization and browsing techniques can help viewers utilize time better– Audio-video different from text
– Many approaches possible• Time-compression, indexes, highlights, …
• This talk focuses on:– Informational presentations
– Automatic summarization methods
11/4/1999 ACM Multimedia 99 4
What Is a Video Summary?
• Assembled from segments of the original
O rig ina l
S um m ary
11/4/1999 ACM Multimedia 99 5
The 4 C’s of a Good Summary
• Conciseness: as short as possible
• Coverage: covers key points
• Context: defines terms before using them
• Coherence: flows naturally and fluidly
11/4/1999 ACM Multimedia 99 6
Talk Outline• Introduction
• Automatic summarization– Sources of information in A/V presentations– Three algorithms
• Evaluation
11/4/1999 ACM Multimedia 99 7
Sources of Information
• Audio and video– Pitch and pause information
• Speaker actions– Slide-transition points
• End-user actions– Video segments watched by earlier viewers
11/4/1999 ACM Multimedia 99 8
Auto-summarization Methods
Method 1
(S)
Method 2
(P)
Method 3
(SPU)
Slide transition
X X
Pitch analysis
X X
User access log
X
11/4/1999 ACM Multimedia 99 9
1. Slide-based Method (S)
• Rationale– Beginning of a slide marks a new topic– Time devoted to slide indicates its importance
• Algorithm– First N% of video for each slide
11/4/1999 ACM Multimedia 99 10
2. Pitch-based Method (P)
• Rationale– Pitch activity indicates the speaker’s emphasis
• Algorithm (based on Arons ISSLP 94)– Compute pitch for every 1ms frame – Count the number of frames above a threshold
in 15 second windows– Select the windows with the most count
11/4/1999 ACM Multimedia 99 11
3. Combined Method (SPU)• The amount of time that previous viewers
spent on a slide indicates importance
010203040506070
0 10 20 30 40 50 60 70 80 90
Nth minute into the talk
User
count
A B
11/4/1999 ACM Multimedia 99 12
3. Combined Method (SPU)
• Algorithm– Compute importance measure for each slide
– Allocate summary time for each slide according to the importance measure
– Use pitch-based algorithm to pick the segments in each slide
Average Viewer Count of Slide N
Average Viewer Count of Slide N-1
Importance of Slide N =
11/4/1999 ACM Multimedia 99 13
Talk Outline• Introduction
• Automatic summarization
• Evaluation– Experimental design– Results
11/4/1999 ACM Multimedia 99 14
Experimental Design
• To compare summarization techniques– Original presenters (authors) created summaries (A) as
gold standard
– Authors wrote quiz questions that covered the content of summaries
– Objective measure: quiz score improvement after watching a summary
– Subjective measures: user survey
11/4/1999 ACM Multimedia 99 15
Experimental Design (Cont.)
• 4 summary types (S, P, SPU, A)
• 4 talks chosen from Microsoft training site
• 24 Microsoft employees were subjects – Summary types and talks are counter-balanced
within each subject
11/4/1999 ACM Multimedia 99 16
Demo Summary
11/4/1999 ACM Multimedia 99 17
Quiz Score Improvement• As expected, author-created summaries did best• No significant difference among the automatic
methods
0
1
2
3
4
5
6
7
8
A SPU P S
Qu
iz S
co
re
pre-study scores
post-summary scores
11/4/1999 ACM Multimedia 99 18
Survey Rating Results
• A >> SPU > P = S
Context
(1-7)
Concise
(1-7)
Coherent
(1-7)
Coverage
(%)
A 5.39 5.57 5.30 75
SPU 4.30 4.52 3.48 63
P 4.00 4.13 3.48 63
S 4.26 4.17 3.64 58
11/4/1999 ACM Multimedia 99 19
Percent of Value Derived
• From slide content: 46%
• From audio content: 36%
• From video content: 18%
11/4/1999 ACM Multimedia 99 20
Interesting Sequence Effect
Order Clear Choppy Overall
1 4.04 6.00 3.65
2 4.39 5.09 4.09
3 4.39 4.70 4.00
4 5.13 3.91 5.18
11/4/1999 ACM Multimedia 99 21
Conclusions
• Ability to skim/browse will be key to wide use
• Automated methods can add significant value– Add domain knowledge is important– Increasing acceptance over time
• Evaluation is a key but very difficult
11/4/1999 ACM Multimedia 99 22
Conclusions (Cont.)
• Getting the human into the loop– Speakers– End-users as a group
• E.g. collaborative filtering
– End-users as an individual• E.g. interactive browsing
• Visit us at: http://research.microsoft.com/coet
11/4/1999 ACM Multimedia 99 23
Interface of a Typical Talk
Table of content
Video
Slides
VCR-likecontrols
11/4/1999 ACM Multimedia 99 24
Summary Characteristics
• Talks were from MS internal training site– UI Design, Internet Explorer, Dynamic HTML,
Microsoft Transaction Server
• Average length– 20% to 25% of the original– 10 to 14 minutes
• Overlap with author-created summaries was no better than chance
11/4/1999 ACM Multimedia 99 25
Survey on the Summary Just Watched
• Concise: It captured the essence of the talk without using too many sentences
• Coverage: My confidence that it covered the key points of the talk is …
• Context: It is clear and easy to understand
• Coherent: It provided reasonable context, transitions, and sentence flow
11/4/1999 ACM Multimedia 99 26
Survey Rating Results
• A >> SPU > P = S
1
2
3
4
5
6
7
Context Concise Coherent Coverage
A
SPU
P
S
11/4/1999 ACM Multimedia 99 27
Information Not Used
• Spoken text content
• Speaker gestures
11/4/1999 ACM Multimedia 99 28
Talk Outline• Introduction
– Motivation– Definition of a video summary– Attributes of a good summary
• Automatic summarization
• Evaluation
11/4/1999 ACM Multimedia 99 29
Viewers Over Time for One Talk• Viewer number decreases overall and
within each slide
010203040506070
0 10 20 30 40 50 60 70 80 90
Nth minute into the talk
User
count
A B
11/4/1999 ACM Multimedia 99 30
Importance Measure
Average Viewer Count of Slide N
Average Viewer Count of Slide N-1
Importance of Slide N =
11/4/1999 ACM Multimedia 99 31
Author-created Summary (A)
• Original presenters (authors) were asked to produce summaries of the talks
– Author marked the text transcript
– Video summaries were generated manually by aligning the video with the marked portions
11/4/1999 ACM Multimedia 99 32
Summary
• Automatic algorithms performed respectably – “That’s pretty cool for a computer. I thought
someone had sat down and made them”– SPU was preferred over S and P
• Will viewers get used to auto summary?
11/4/1999 ACM Multimedia 99 33
Future Work
• Compare audio/video and text summaries
• Interactive and intelligent video browser
• Visit us at http://research.microsoft.com/coet