EE 6850 Lecture #5 (Oct. 2, 2002)sfchang/course/vis/SLIDE/lecture5_include_scan.pdfEE 6850 Lecture...

EE 6850, F'02, Chang, Columbia U.

EE 6850 Lecture #5 (Oct. 2, 2002)

� Syntactic-level browsing and visualization� Keyframe selection and browsing� Visualization and skimming

� References� D. Zhong, H. Zhang and S.-F. Chang, “Clustering Methods for Video Browsing and

Annotation”, IS&T/SPIE Symposium on Storage and Retrieval for Image and Video Database, San Jose, February 1996.

� H.J. Zhang, C.Y. Low, S.W. Smoliar and J.H. Wu,"Video Parsing, Retrieval and Browsing: An integrated and content-based solution. In Proc. of the ACM Multimedia Conference, pages 15--24, 1995.

� M. Christel, A. Hauptmann, A. Warmack and S. Crosby, "Adjustable Filmstrips and Skims as Abstractions for a Digital Video Library," IEEE Advances in Digital Libraries Conference, Baltimore, MD, May 1999.

� Yeo, B.-L., and Yeung, M.M. Retrieving and Visualizing Video, Communications of ACM, 40, 12 (Dec. 1997), pp. 43-52.

� H. Sundaram, S.-F. Chang, Constrained Utility Maximization for generating Visual Skims, iEEEWorkshop on Content-based Access of Image and Video Libraries (CBAIVL'2001) Dec. 2001 Kauai, HI USA.


Issues

� Shot summary: play each video clip?� Keyframe selection and browsing� Visualize a large collection of clips?

Examples:WebClip and WebSeek video visualizationDifferent types of video: consumer, sports, news etc

Approaches:� Keyframes, Mosaicing� Hierarchical Clustering� Spatial Summary and visualization


Keyframe Selection

� Considerations� Flexibility (number and level)� Fidelity (content comprehension)

� Approaches� Fixed number, fixed spacing� First/last frame, clean frame, cluster centroids� Difference, motion� Clustering

~

~

"'~c;

:;.

~

~,

}

~c.

cs ~

r ~

~ " fs:

::9' Q*

{0"

- -, .- ;s:..

.

~~

~7 ':I:.&

atcj

81\

~ -, t"\ ~ '5'

! (') r I

~

Structure Parsing (Zhong’96)

� Automatic Layout of KeyFrames [Uchihashi & Foote ’99]

� Given 2D space constraints, KeyFrame Set, and their importance measures, what’s the best display layout?� Comic book concept

� Issues:� Time Order vs. Layout

Order� Preserve high-level

structures� Importance Measures


Packing Keyframes to 2D space [Uchihashi ’99]


Information Visualization

� Shneiderman ‘96: “ Overview first, zoom and filter, then details-on-demand”

� Problem: � result overload and confusing search interfaces

� browsing 100-1000 video segments� Search using multiple features

� Approaches: � Text items + scores� Thumbnail key frames + scores� Map digest� VIBE concept map� Distance Map� Feature-space browser

� CMU Informedia Project� > 1200 hrs news, 400 hrs

documentary

� Keyword search thumbnails

� Communicate more information than just “top 10 textual list”

� Color bar: word match scoretext headline

� Issue: � keyframe selection, number

of kframes� Context� Temporal information

Filmstrip

� Spatial static abstraction of multiple shots

� Issues:� Use film border to

indicate association� Time relation

between shot and whole segment

� Too many key frames: use match shots only? �query-based filmstrip

Timeline Digest(e.g., impeachment)

� Explore temporal trend� Can be combined with

VIBE and Map Digest� Issue: context,

a/v content

Map Digest(e.g., Pope visit)

� Combine visuals and time clusters

Skim

� Temporal abstraction:motivate viewers

� Time compression:preserving essential data

� Segments with match words are combined

� Each segment is extended based on the “goodness scores” of the ending point, until the time budget is reached

� Issues:� Choppy presentation� Temporal syntax (e.g., dialog)� Early cutout of sentence,

scene, audio

Included segment

Word/phrasematch

VIBE� Provide relevance to each

concept� Number of related concepts� Relative weights

� drag the anchors to see related data

� manipulate the concept combinations, e.g., and, or, not

� zoom in specific areas

� Filter by time, location…

� Issues:� Location ambiguity� Context beyond word

matching

(example: Clinton, Andrew, Johnson, Impeachment)


VIBE: Concept Map

� Active query elements mapped in a 2D display.� Each query element visualized as a concept.� Location of return results is function of position and

relative distances to each concept.� Allow users to explore concept relationships.� Allow users to zoom in to particular return results.

Q0 Q1

d0+d1

p = d0p0 + d1p1

x

y


Variations of Concept Map

sunset

low complexity

skiing

high activity

camera

d0+ d1 + d2

d0p0 + d1p1 + d2p2 P=

• Point on the edges are covered by two concepts only• Points at the center are equally distant from all concepts

• Issues: multiple concepts, different modalities

Video Skim Generation (Sundaram/Chang 01)

dropped frames1. What is the appropriate problem formulation?

2. What are important types of skims?3. Possible operations:

shot selection and trimming.4. What’d the data unit for transformation?5. How is the “quality” affected?

Aesthetic affects, information comprehension

utility framework to modelrelation between operations anduser comprehension� optimal skim generation

News story100 sec �

16 secShot removal

Skim: Drastically condensed audio-video clips

Action scene190 sec �

38 secproportional

The entities preserved in skims

� Video shot (duration altered)� The fundamental video entity; we shall maximize the

coherence of each retained video shot� Segment beginnings (SBEG’s) significant phrase

� This is an element of the speech discourse� Synchronous multimedia segments

� Ensures maximum skim coherence� Elements of visual syntax

� dialogs, regular anchors, shot phrases� Film rhythm

� Preserves the “pace” of the film


Modeling Utility of Shots

� How much time is required for generic comprehension (who, what, where, when)?

� Is comprehension time related to the visual spatio-temporal complexity of the shot ?

(a) (b)

� Explore Viewer Perceptual Model

Estimate Utility Function from Subjective Study

complexity →

Re

qu

ire

d t

ime

→

0

02.

54.

5

1

reduce to upper boundoriginal

shot Ub

Lb

( ) 2 .4 0 1 .1 1

( ) 0 .6 1 0 .6 8b

b

U c c

L c c

= += +

� Plot of average required time vs. complexity shows two bounds

Shot utility function

t: duration, c: complexity

: selection indicator sequence

Utility of shot sequence

Preserving syntax

� Minimum number of shots in a scene� The particular ordering of the shots (cut)� The specific duration of the shots,

to direct viewer attention� Changing the scale of the shots

The specific arrangement of shots so as to bring out their mutual relationship. [ sharff 82 ].

Film makers think in terms of phrases of shots and not individual shots.


The progressive phrase

Hence, a phrase (a group of shots) must at least have three shots.

“Two well chosen shots will create expectations of the development of narrative; the third well-chosen shot will resolve those expectations.”[ sharff 82 ].

Maximal shot removal:eliminate all the dark shots.


Structure (dialog)

Hence, a dialog must at least have six shots

“Depicting a conversation between m people requires 3m shots.” [ sharff 82 ].

Maximal adaptation:eliminate all the dark shots.

SVM classifier

Time-dependentViterbi decoder-temporal consistency

Mid-level audio content analysis

silence

significant phrases, beginning of topical segments

Understand types and importance of audio by prosody analysis: (pitch,

pause, energy)[details in ACM MM2002]

audio-scenes

silence removal

non-speech cleannoisy

speech

Synchronous Entities

� Synchronous segments:� Include all significant speech phrases and both opening and ending syntactic

segments� Audio and video boundaries are fully synchronized� Not condensed or de-synchronized� Such tied segments allow viewers to “catch up” when viewing skims

� Untied segments:� Audio-video can be dropped, condensed, reduced� Audio-video segments do not have to synchronize

Opening syntax closingsignificant phrase

Dialog syntax

Do you make up these questions, Mr. Holden?

The Constrained Search Problem

ξ

ο

ο φ

φ

ξ ξ

φ

ξ

ξ

∗ ∗ ∗ ∗

=

= =

=

≤ ≤ =

≤ ≤ ≥=

+ =

= +

∑

∑

∑ ∑

rr r

r rr r r r

K, ,

, , ,

, , , , ,

, , , , min

,

: ( ) 1

,

, ,

1 1

( , , , ) arg min ( , , , )

subject to:

, : ( ) 1,

( ) , ,

,

,

, : 1

a v c

b

v

l v l a

a v c f a v ct t n

L i v i v i v v

i i a i a v

i v f

i i

i a i f

j

N N

v i a j j c

i j

t t n O t t n

t t t i i

T k t t N N

t T

t T

t t l n

duration constraints

target time constraints

Multimedia tie constraints

Skim generation frameworkEntity analysis:

shot detection /

auditory analysis

video utility model

audio utility model

objective function

iterative

maximization

skim generation

audio / video duration constraints

target skim time

tie constraints

visual syntax constraints

constraints

proportional

optimal


Potential Projects

� New video shot visualization tools� New skim generation technique� Improved integrated a/v browsing systems

EE 6850 Lecture #5 (Oct. 2, 2002)sfchang/course/vis/SLIDE/lecture5_include_scan.pdfEE 6850 Lecture...

Documents

Transcript of EE 6850 Lecture #5 (Oct. 2, 2002)sfchang/course/vis/SLIDE/lecture5_include_scan.pdfEE 6850 Lecture...