Post on 23-Feb-2016
description
Digital Video LibraryDigital Video LibraryExperience in Large Scale Content Management
VIEW Technologies Symposium – CUHK – August 2002
Howard WactlarCarnegie Mellon University, USA
Acquisition
Surveillance
Radio
Broadcast TV
Training Film
Satellite
Video Life CycleAnalysis and Organization
1010
011
100 01 10
SpeechRecognition
ImageAnalysis
Natural Language
Interpretation
Database
……………………….Digital Compression
……………………….
……
……
……
……
……
……
……
……Segmentation
Distribution
Cable
PDA
Cell Phone
Internet
REQUIREMENTS:
• Automated process for information extraction from video• Full-content search and retrieval from all spoken language
and visual documents
Establishment of large video libraries as a network searchable information resource
Mission: Enable Search and Discovery in the Video Medium
APPROACH:
• Integration of machine speech, image and natural language understanding for library creation and exploration
Informedia Overview
CNN News Broadcasts 1997-2002 (2050 hours)• 68,000 segments/stories• 1.7 Million “shots”
China Historical and Cultural Documentaries (100 hours) • English language
• Western perspective
Sample Corpora
Some Examples
Why is Multimedia Difficult?
Challenges of Data Extraction
Scene Text DetectionRecognizing Scene Text and Faces
Interpreting Images Containing Similar Content
Style Variationscareful, clear, articulated, formal, casual
spontaneous, normal, read,dictated, intimateVoice Quality
breathy, creaky,whispery, tense,
lax, modal
Contextsport, professional,
interview, free conversation,
man-machine dialogue
Speaking Ratenormal, slow, fast,
very fast
Stress in noise, with increased vocal
effort (Lombard reflex),emotional factors (e.g. angry),
under cognitive load
Understanding Speech in Natural Settings
Gathering Information with Faulty Technology• Retrieval performance in the presence of inaccuracy and
ambiguity in the underlying cognitive processing
• Approximate match in meaning and visualization
• Presentation and reuse of library content• New data type with space and time dimensions• Restricted use intellectual property
• Interoperability in the absence of standards
Challenge of Continuous Production
Commercial• 4500 motion pictures -> 9,000 hours/year (4.5 TB)
• 33,000 TV stations x 4 hrs/day -> 48,000,000 hrs/yr (24,000 TB)
• 44,000 radio stations x 4 hrs/day -> 65,500,000 hrs/yr (3,275 TB)
Personal• Photographs: 80 billion images -> 410,000 TB/yr
• Home videos: 1.4 billion tapes -> 300,000 TB/yr
• X-rays: 2 billion -> 17,000 TB/yr
Surveillance• Airports: 14,000 terminals x 140 cameras x 24 hrs/day -> 48 M hrs/day
Annual Video and Audio Production
Commercial• 22,600 newspapers x 30 pgs/day -> 124 TB/year
• 80,000 periodicals x 5,000 pgs/yr -> 52 TB/yr
• 40,000 scholarly journals x 1,700 pgs/yr -> 9 TB/yr
Annual Print Production
Video Visualization____
Summarizing and Visualizing the Result Set
Map collage summarizing “El Niño effects” showing distribution by nation with overlaid thumbnails
North PacificOcean
South PacificOcean
Summarizing Thousands of VideosExample: Map Collage
Drought
Drought
FireFloods
The Need for Visualization Strategies• As digital video assets grow, so do possible result sets
• We transmit with limited bandwidths to limited screen “real estate”
• As automated processing improves, more metadata enables more dimensions and interfaces into the video content
• Users want to apply multiple perspectives interchangeably
• Direct manipulation interfaces are required to place the user in control
Some Examples
Video DigestsOverview first, zoom and filter, then details-on-demand
• Concatenate scene elements into a single panoramic view
• Visualize word-based relationships
• Establish timelines showing trends against time
• Present maps (or diagrams) showing geographic (or spatial) correlations
• Combine digests into a single view or animated into a temporal presentation (the auto-documentary)
Content-based Metadata Extraction Enables Video Visualization and Summarization
Personalized PresentationSummarizer
Metadata Extractor
UserPerspectiveTemplates
People Event Affiliation Location Topics Time
Information Goals
• Generate information perspectives on-demand: • e.g., by time, location, personalities, events
• Eliminate redundancy
• Link all the way back to source content to interactively and dynamically provide any level of detail and summarization
• Communicate results
Knowledge Goals
• Detect trends
• Reveal relationships
• Infer causality
• Discover anomalies
• ….
Acquisition
Surveillance
Radio
Broadcast TV
Training Film
Satellite
Video Life CycleAnalysis and Organization
1010
011
100 01 10
SpeechRecognition
ImageAnalysis
Natural Language
Interpretation
Database
……………………….Digital Compression
……………………….
……
……
……
……
……
……
……
……Segmentation
Distribution
Cable
PDA
Cell Phone
Internet
$$$$ $$$$
Consumer and Business
• Evolving and archived news and information
• Education and training
• Sports and entertainment
• Interactive television
• Personal memory aids
Professional and Enterprise
• Conventions and tradeshows
• Meetings/corporate memory
Application Space
Digital Video LibraryDigital Video Library
Thank you