Question Answering from Errorful Multimedia Streams AQUAINT PI Meeting – June 2002

21
Question Answering from Errorful Multimedia Streams AQUAINT PI Meeting – June 2002 Howard D. Wactlar Carnegie Mellon University, USA Digital Video Digital Video Library Library

description

Digital Video Library. Question Answering from Errorful Multimedia Streams AQUAINT PI Meeting – June 2002 Howard D. Wactlar Carnegie Mellon University, USA. Outline. Goals for QA from multimedia Background Informedia Information extraction Determining answer information - PowerPoint PPT Presentation

Transcript of Question Answering from Errorful Multimedia Streams AQUAINT PI Meeting – June 2002

Page 1: Question Answering from Errorful Multimedia Streams AQUAINT PI Meeting  – June 2002

Question Answering from Errorful Multimedia Streams

AQUAINT PI Meeting – June 2002

Howard D. WactlarCarnegie Mellon University, USA

Digital Video LibraryDigital Video Library

Page 2: Question Answering from Errorful Multimedia Streams AQUAINT PI Meeting  – June 2002

Outline

• Goals for QA from multimedia

• Background- Informedia

- Information extraction

• Determining answer information

• Presenting the answer and follow-up

Page 3: Question Answering from Errorful Multimedia Streams AQUAINT PI Meeting  – June 2002
Page 4: Question Answering from Errorful Multimedia Streams AQUAINT PI Meeting  – June 2002

Why is Multimedia Important

• TV and radio broadcasts record human events across the globe

• Broadcast interviews, analysis and opinions created globally provide varied interpretive perspectives and context

• Images of people, events, maps and charts provide additional content not conveyed orally

- May be correlated with the spoken words

• Some pictures are worth a thousand words

Page 5: Question Answering from Errorful Multimedia Streams AQUAINT PI Meeting  – June 2002

Annual Video and Audio Production

Commercial

• 4500 motion pictures -> 9,000 hours/year (4.5 TB)

• 33,000 TV stations x 4 hrs/day -> 48,000,000 hrs/yr (24,000 TB)

• 44,000 radio stations x 4 hrs/day -> 65,500,000 hrs/yr (3,275 TB)

Personal

• Photographs: 80 billion images -> 410,000 TB/yr

• Home videos: 1.4 billion tapes -> 300,000 TB/yr

• X-rays: 2 billion -> 17,000 TB/yr

Surveillance

• Airports: 14,000 terminals x 140 cameras x 24 hrs/day -> 48 M hrs/day

Page 6: Question Answering from Errorful Multimedia Streams AQUAINT PI Meeting  – June 2002

Background

REQUIREMENTS:

- Automated process for information extraction from video

- Full-content search and retrieval from any spoken language and visual document

Establishment of large video libraries as a network searchable information resource

Mission: Enable Search and Discovery in the Video Medium

APPROACH: Integration of machine speech, image and natural language

understanding for library creation and exploration

Exploit operational Informedia DVL infrastructure and technology

Page 7: Question Answering from Errorful Multimedia Streams AQUAINT PI Meeting  – June 2002

Indexing

Relevant Result SetRelevant Result Set

Requested Segment Requested Segment or Summarizationor Summarization

Information Exploration & DiscoveryInformation Exploration & DiscoveryONLINEONLINE

MultimodalMultimodalQueriesQueries

AnalystAnalyst

BrowsingBrowsingand Query and Query RefinementRefinement

Information Collection & AnalysisInformation Collection & AnalysisOFFLINEOFFLINE

Indexed DatabaseIndexed SegmentedTranscript Compressed Audio/Video& Images

Distribution To Users

Processing

Entity ExtractionFace, OCR Text Recognition

1010

011

100 01 10

Surveillance Broadcast TV Radio

Digital Encoding

ImageAnalysis

Speech Analysis

Informedia System Architecture

Page 8: Question Answering from Errorful Multimedia Streams AQUAINT PI Meeting  – June 2002
Page 9: Question Answering from Errorful Multimedia Streams AQUAINT PI Meeting  – June 2002

Related Language Processing Work

• MUC, DUC, TREC especially QA track- Pronoun and Anaphora resolution

- Part-of-speech tagging

- Fact extraction

- Summarization

- Question-answering

…Electronic text focus

Page 10: Question Answering from Errorful Multimedia Streams AQUAINT PI Meeting  – June 2002

Why is Multimedia Hard

• It’s a fundamentally linear, temporal medium

• Speech, image and language understanding are all errorful, ambiguous and incomplete

• Information must be time-synchronized and correlated across modalities for both produced and natural video

• Verbal content lacks:- sentence boundaries,

- punctuation,

- capitalization …that enables a syntactic analysis

• Image recognition w/o known context is very limited

• Many errors from many sources!

Page 11: Question Answering from Errorful Multimedia Streams AQUAINT PI Meeting  – June 2002

Why We Think the Problems are Trackable

• Lot’s of data enables LEARNING systems

• Have shown complete or perfect information is not necessary

• Utilize multiple sources of information jointly: - text, image, audio, web text and databases

Page 12: Question Answering from Errorful Multimedia Streams AQUAINT PI Meeting  – June 2002

Research Focus

• Determining the answer information- Resolving co-references

- Discovering semantic relations

- Learning Information flow

- Hardening uncertain information

• Organizing and presenting the answer result- Text summaries

- Augmenting contextual material

- Maps, charts and images to allow follow-up questions

- Explicit representation of uncertainty

Page 13: Question Answering from Errorful Multimedia Streams AQUAINT PI Meeting  – June 2002

Resolving Co-references

• When is the same person mentioned (or seen, or identified)

• Places referenced (in words, on signs, on maps)

• Organizations cited (verbally, on signage, in charts)

• Requires:- Pronoun resolution

- Merge multiple spellings, abbreviations and contractions

- Merge across media (OCR, audio, text, faces)

Page 14: Question Answering from Errorful Multimedia Streams AQUAINT PI Meeting  – June 2002

Mining Links and Learning Semantic Relations

• Visualize co-occurrence in documents, in location, in time- Location can be variably sized regions

- Times can be arbitrary periods

• Finding semantic roles for related named entities- Dr. X is CEO of company Y

Page 15: Question Answering from Errorful Multimedia Streams AQUAINT PI Meeting  – June 2002
Page 16: Question Answering from Errorful Multimedia Streams AQUAINT PI Meeting  – June 2002

Active Hardening of Evidence

• Extracted information is noisy

• Acquire new supporting or falsifying evidence from other sources (web)

- On-demand or

- Automatically when original evidence is weak

…Result is higher fidelity information

Page 17: Question Answering from Errorful Multimedia Streams AQUAINT PI Meeting  – June 2002

Learning Information Flow

Tightly correlated

Information flow

Conditional information flow3-6 days

CNN ABC

Radio Duetsch Welle

(Germany)

Wiretap 1(Saudi Arabia)

HiddenSource 3

3-6 days

HiddenSource 4

RadioTehran(Iran)

Lifestyle news

HiddenSource 1

HiddenSource 2

News onMiddle East,

407 days

Page 18: Question Answering from Errorful Multimedia Streams AQUAINT PI Meeting  – June 2002

Learning Information Flow

• Where did a fact originate?

• Multiple sources report facts over time, with small changes- E.g. Different newspapers get the same story from AP or

Reuters source. Story ‘looks’ different.

- Imagery frequently is reused as well

• Columbia’s Newsblaster exploits this idea for summarization of the core story sentences

Page 19: Question Answering from Errorful Multimedia Streams AQUAINT PI Meeting  – June 2002

Integrated Analysis Environment

• Summarize multimedia information visually and textually

• Allow explicit display of and control over acceptable level of uncertainty

• Show link structure of entities and relations

• Interactive visualization for drill-down and follow-up

Page 20: Question Answering from Errorful Multimedia Streams AQUAINT PI Meeting  – June 2002

Strategic Advantages of Multimedia Analysis and Response

• Collect Large Amounts of Data

• Learning Approaches

• Leverage across media types

• Perfection is not necessary (80% solution may be ok)

• User in the loop filters remaining errors

• Effective interfaces and visualizations

Page 21: Question Answering from Errorful Multimedia Streams AQUAINT PI Meeting  – June 2002

Digital Video LibraryDigital Video Library