2003.03.17 - SLIDE 1IS246 - SPRING 2003 Lecture 15: Automated Analysis: Video IS246 Multimedia...

37
2003.03.17 - SLIDE 1 IS246 - SPRING 2003 Lecture 15: Automated Analysis: Video IS246 Multimedia Information (FILM 240, Section 4) Prof. Marc Davis UC Berkeley SIMS Monday and Wednesday 2:00 pm – 3:30 pm Spring 2003 http://www.sims.berkeley.edu/academics/ courses/is246/s03/
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    213
  • download

    0

Transcript of 2003.03.17 - SLIDE 1IS246 - SPRING 2003 Lecture 15: Automated Analysis: Video IS246 Multimedia...

2003.03.17 - SLIDE 1IS246 - SPRING 2003

Lecture 15: Automated Analysis: Video

IS246Multimedia Information

(FILM 240, Section 4)

Prof. Marc DavisUC Berkeley SIMS

Monday and Wednesday 2:00 pm – 3:30 pmSpring 2003

http://www.sims.berkeley.edu/academics/courses/is246/s03/

2003.03.17 - SLIDE 2IS246 - SPRING 2003

Today’s Agenda

• Review of Last Time

– Automated Analysis: Audio

• Automated Analysis: Video

– Signal-Based Parsing

– Image Analysis

– Video Analysis

– Discussion Questions

• Action Items for Next Time

2003.03.17 - SLIDE 3IS246 - SPRING 2003

Today’s Agenda

• Review of Last Time

– Automated Analysis: Audio

• Automated Analysis: Video

– Signal-Based Parsing

– Image Analysis

– Video Analysis

– Discussion Questions

• Action Items for Next Time

2003.03.17 - SLIDE 4IS246 - SPRING 2003

Low-Level Audio Descriptors

• Silence, voice onset (Aarons) • Speech/music (Schierer)• Rough spectral features (musclefish)

– Loud/soft, bright/dark, low/high

• Musical pitch and timbre • Speech emphasis/prosody (Chen, Aarons)• Characteristic audio events

– Cheers, gunshots (Pfeiffer)

• Musical tempo (Cook & Tzanetakis)– Rhythmic strength?

• Fine spectral features– Spectrogram, Mel-frequency cepstral coefficients

2003.03.17 - SLIDE 5IS246 - SPRING 2003

Speech Recognition

• Ideally, bridges the “semantic gap”• One little problem:

– IT DOESN’T WORK!

• Spectrum of reliability– Known vs. unknown speaker (+ training)– Known or small vs. unknown vocabulary (+ training)– Read text vs. conversational speech– Standard English vs. dialects, accents, other

languages– Close microphone vs. distant– Clean, quiet vs. mixed or noisy conditions– Known acoustic channel vs. telephone

2003.03.17 - SLIDE 6IS246 - SPRING 2003

Today’s Agenda

• Review of Last Time

– Automated Analysis: Audio

• Automated Analysis: Video

– Signal-Based Parsing

– Image Analysis

– Video Analysis

– Discussion Questions

• Action Items for Next Time

2003.03.17 - SLIDE 7IS246 - SPRING 2003

Signal-Based Parsing

• Practical problem– Parsing unstructured, unknown video is very,

very hard

• Theoretical problem– Mismatch between percepts and concepts

2003.03.17 - SLIDE 8IS246 - SPRING 2003

Perceptual/Conceptual Issue

Clown Nose Red Sun

Similar Percepts / Dissimilar Concepts

2003.03.17 - SLIDE 9IS246 - SPRING 2003

Perceptual/Conceptual Issue

Car Car

Dissimilar Percepts / Similar Concepts

John Dillinger’s Timothy McVeigh’s

2003.03.17 - SLIDE 10IS246 - SPRING 2003

Signal-Based Parsing

• Effective and useful automatic parsing

– Video• Scene break detection

• Camera motion analysis

• Low level visual similarity

• Feature tracking

– Audio• Pause detection

• Audio pattern matching

• Simple speech recognition

• Approaches to automated parsing

– At the point of capture, integrate the recording device, the environment, and agents in the environment into an interactive system

– After capture, use “human-in-the-loop” algorithms to leverage human and machine intelligence

2003.03.17 - SLIDE 11IS246 - SPRING 2003

Image/Video Analysis Communities

• Research and development– Academia

• MIT, CMU, GeorgiaTech, UC Berkeley, Columbia University

– Government• NSA, CIA, FBI, etc.

– Industry• Media, retail, training, consumer electronics, surveillance,

industrial automation, libraries

• Research directions– Computer vision, robotics, knowledge representation,

digital libraries, media asset management, production automation

2003.03.17 - SLIDE 12IS246 - SPRING 2003

Major Areas

• Image analysis– Color similarity– Texture similarity– Shape similarity– Spatial similarity– Object presence analysis

• Video analysis– Temporal segmentation– Content and motion analysis

2003.03.17 - SLIDE 13IS246 - SPRING 2003

Semantic Gap

• Most of the disappointments with early retrieval systems come from the lack of recognizing the existence of the semantic gap and its consequences for system set-up

• The semantic gap is the lack of coincidence between the information that one can extract from the visual data and the interpretation that the same data have for a user in a given situation

2003.03.17 - SLIDE 14IS246 - SPRING 2003

Sensory Gap

• The sensory gap is the gap between the object in the world and the information in a (computational) description derived from a recording of that scene

• The sensory gap makes the description of objects an ill-posed problem: – It yields uncertainty in what is known about the state of the

object– The sensory gap is particularly poignant when a precise

knowledge of the recording conditions is missing– The 2D-records of different 3D-objects can be identical—without

further knowledge, one has to decide that they might represent the same object

– A 2D-recording of a 3D-scene contains information accidental for that scene and that sensing but one does not know what part of the information is scene-related

2003.03.17 - SLIDE 15IS246 - SPRING 2003

At the End of the Early Years

• “Computer vision researchers should identify features required for interactive image understanding, rather than their discipline’s current emphasis on automatic techniques (1992)”

2003.03.17 - SLIDE 16IS246 - SPRING 2003

Narrow vs. Broad Domains

2003.03.17 - SLIDE 17IS246 - SPRING 2003

Today’s Agenda

• Review of Last Time

– Automated Analysis: Audio

• Automated Analysis: Video

– Signal-Based Parsing

– Image Analysis

– Video Analysis

– Discussion Questions

• Action Items for Next Time

2003.03.17 - SLIDE 18IS246 - SPRING 2003

Image Similarity

• Extraction of features or image signatures from the images– Efficient representation and storage strategy for this

precomputed data

• A set of similarity measures– Capture some perceptively meaningful definition of similarity– Efficiently computable when matching an example with the

whole database

• A user interface for– The choice of which definition(s) of similarity should be applied

for retrieval– The ordered and visually efficient presentation of retrieved

images– The supporting relevance feedback

2003.03.17 - SLIDE 19IS246 - SPRING 2003

Image Similarity

• Color similarity– Choice of color space matters

• Texture similarity– Specific textures can be parsed well

• Shape similarity– Scale, rotation, and deformation affect shape

• Spatial similarity– Relative vs. absolute positions and distances

• Object presence analysis– Face detection/recognition is active area

2003.03.17 - SLIDE 20IS246 - SPRING 2003

Object Segmentation

• “Object segmentation for broad domains of general images is not likely to succeed, with a possible exception for sophisticated techniques in very narrow domains.”

2003.03.17 - SLIDE 21IS246 - SPRING 2003

Feature Types

2003.03.17 - SLIDE 22IS246 - SPRING 2003

Today’s Agenda

• Review of Last Time

– Automated Analysis: Audio

• Automated Analysis: Video

– Signal-Based Parsing

– Image Analysis

– Video Analysis

– Discussion Questions

• Action Items for Next Time

2003.03.17 - SLIDE 23IS246 - SPRING 2003

Video Analysis

• Temporal segmentation– Shot boundary detection

• Camera work and object motion analysis– Pan and zoom detection– Object tracking

• Framing and focus

• Video soundtrack analysis

• Video scene analysis– Video “paragraphing” in restricted domains

2003.03.17 - SLIDE 24IS246 - SPRING 2003

Temporal Segmentation

• Shot change properties– Motion

• Camera motion• Object motion

– Image• Luminosity• Color• Noise

– Type• Abrupt (cut)• Progressive (dissolve, fade, wipe)

• Shot change detection methods– Difference in statistical signatures of image change– Explicit modeling of motion

2003.03.17 - SLIDE 25IS246 - SPRING 2003

Video Analysis

• Video abstraction and representation– Video icon construction– Video key-frame extraction– Video skimming

• Shot similarity and content-based retrieval• Visual presentation and annotation for

video• Video indexing and visual cataloguing• Computer-assisted video annotation and

transcription

2003.03.17 - SLIDE 26IS246 - SPRING 2003

Tonomura’s VideoSpaceIcon

2003.03.17 - SLIDE 27IS246 - SPRING 2003

Video Similarity

• Query: – Retrieve a video segment of “a hammer hitting a nail

into a piece of wood”

• Sample results:– Video of a hammer hitting a nail into a piece of wood– Video of a hammer, a nail, and a piece of wood– Video of a nail hitting a hammer, and a piece of wood– Video of a sledgehammer hitting a spike into a

railroad tie– Video of a rock hitting a nail into a piece of wood– Video of a hammer swinging– Video of a nail in a piece of wood

2003.03.17 - SLIDE 28IS246 - SPRING 2003

Types of Video Similarity

• Semantic– Similarity of descriptors

• Relational– Similarity of relations among descriptors in

compound descriptors

• Temporal– Similarity of temporal relations among

descriptors and compound descriptors

2003.03.17 - SLIDE 29IS246 - SPRING 2003

Retrieval Examples to Think With

• “Video of a hammer, a nail, and a piece of wood”– Exact semantic and temporal similarity, but no relational

similarity

• “Video of a nail hitting a hammer, and a piece of wood” – Exact semantic and temporal similarity, but incorrect relational

similarity

• “Video of a sledgehammer hitting a spike into a railroad tie”– Approximate semantic similarity of the subject and objects of the

action and exact semantic similarity of the action; and exact temporal and relational similarity

• “Video of a hammer swinging” cut to “Video of a nail in a piece of wood”

2003.03.17 - SLIDE 30IS246 - SPRING 2003

Today’s Agenda

• Review of Last Time

– Automated Analysis: Audio

• Automated Analysis: Video

– Signal-Based Parsing

– Image Analysis

– Video Analysis

– Discussion Questions

• Action Items for Next Time

2003.03.17 - SLIDE 31IS246 - SPRING 2003

Discussion Questions

• On “Content-Based Representation and Retrieval of Visual Media: A State-of-the-Art Review ” (Risto Sarvas)– Are these technologies implemented in current camcorders

and editing software? It seems that with these technologies (e.g., video parsing & video abstraction) a simple functionality like automatically cutting the video into shots wouldn't be too difficult.

– How much of these technologies are available for use? For example, are some of the image similarity algorithms available as open source software?

– The reading focused on analyzing media content after the recording. How much there is technology for letting the human analyze the content as it is produced? In other words, what is the state-of-the-art in standardizing content annotation?

2003.03.17 - SLIDE 32IS246 - SPRING 2003

Discussion Questions

• On “Content-Based Representation and Retrieval of Visual Media: A State-of-the-Art Review ”(Catherine Lai)– While much effort has been spent on discussing the

process or conditions of content-based image retrieval, what constitutes a valid evaluation method? How large should the image databases be and who decides what should be included in the image databases?

– How can user feedback be exploited to improve user interaction and techniques for image browsing?

2003.03.17 - SLIDE 33IS246 - SPRING 2003

Discussion Questions

• On “Content-Based Image Retrieval at the End of the Early Years” (Ana Ramirez)– To what degree do users have to be experts

in computer vision to interactively annotate media or query for media?

– How is the semantic gap for time-based media different than for static media?

2003.03.17 - SLIDE 34IS246 - SPRING 2003

Discussion Questions

• On “Content-Based Image Retrieval at the End of the Early Years” (Ping Yee)– Consider the various types of queries that are

typically possible in content-based image retrieval systems:

• Query by spatial predicate• Query by image predicate• Query by group predicate• Query by spatial example• Query by image example• Query by group example

– Which of these queries are likely to be applicable to retrieval for digital video? In what situations might such queries be made?

2003.03.17 - SLIDE 35IS246 - SPRING 2003

Today’s Agenda

• Review of Last Time

– Automated Analysis: Audio

• Automated Analysis: Video

– Signal-Based Parsing

– Image Analysis

– Video Analysis

– Discussion Questions

• Action Items for Next Time

2003.03.17 - SLIDE 36IS246 - SPRING 2003

Presentations on Wednesday

• Every group will have 15 minutes in class on Wednesday, March 19, 2003, to present their work on Assignment 2

• Your presentation must include– Short Plot Outline– Showing of your movie– Discussion of at least one practical and one

theoretical issue you dealt with– Lessons learned– Time for questions

2003.03.17 - SLIDE 37IS246 - SPRING 2003

Hand In

• Drop off the following files in your designated group drop-off box under M:/is246/assignment2/drop_off/groupX– Final edit

• Format: MPEG

– Presentation• Format: MS PowerPoint or html

– Paragraph Answering One or More of the Questions For Thought

• Format: txt, MS Word, or html