1 Insights into Multilingual and Multimedia Question Answering G. Ciany *, A. Kulman +, P. Schone +,...

23
1 Insights into Multilingual and Multimedia Question Answering G. Ciany*, A. Kulman + , P. Schone + , C. Van Ess-Dykema + * Dragon Development, + U.S. Dept. of Defense AQUAINT 18 Month Workshop June 9 - 12, 2003 San Diego, California

Transcript of 1 Insights into Multilingual and Multimedia Question Answering G. Ciany *, A. Kulman +, P. Schone +,...

1

Insights into Multilingual and Multimedia

Question Answering

G. Ciany*, A. Kulman+, P. Schone+, C. Van Ess-Dykema+

* Dragon Development, +U.S. Dept. of Defense

AQUAINT 18 Month Workshop

June 9 - 12, 2003

San Diego, California

2

Who we are

SPAWAR Developing geographical name disambiguation algorithms for AQUAINTDARPA Advancing state-of-the-art for multimedia technologies

(EARS and TIDES programs)

Department of Defense Building the core system and incorporating tools from other AQUAINT contractors.University of Maryland at College Park (LAMP lab) Leveraging English tools; using parallel projection as a means of producing similar tools for foreign language

Team Players

Associates

3

Overview

• Q&A needs for Intelligence Community analysts• Description of QACTIS and its goals• Specific observations in data gathering• Insights into compilation and assembly of tools for cross-media, cross-language QA• Introductory experiments and results

4

Generalized Needsfor Q&A in the IC

• Multi-agency data collections that are– Multilingual – Multi-genre– Multimedia

• Allowing questions whose answers can only be found from merging information across– Multiple documents– Multiple languages– Metadata

• Allowing questions that make reference to– Individual documents – Data collections at large

5

Current Systems Provide OnlyPartial Solutions

Much of the current Q&A research focuses on extracting answers that are– factoid questions – found in single documents – in English– generated from authored texts

(primarily newswire)

6

QACTIS:Our Prototype System

• A major goal of our system is to fill some Q&A research gaps (e.g., multimedia & multilanguage)

• Our system attempts to answers questions where:– questions are in English or Spanish

– the data collections also are English or Spanish

– the data is derived from multimedia sources such as• text corpora (e.g., TREC newswire)

• speech corpora (eg., reference and errorful transcripts from Spanish CallHome/Friend & English Switchboard telephone conversations)

• other data sources (e.g., VACE collections)

7

System Architecture

Image Text SpeechOCR STT•ScID•LID

•SID•LID•GID

Geo-graphical

Sem Forests

Knowledge Bases

Interpret the question

Find the answer

Multi-tiered Refinement

Extraction

Represent

the answer

MT

Multilingual Question

(English, Spanish,

Arabic, etc.)

STT=Speech to text

LID = Language ID

ScID = Script ID

SID = Speaker ID

GID = Gender ID

M T

Word

Net

MT

IR

Summarization

8

Assembling Data for QACTIS

Basic QA MultilingualAspects

MultilingualMultimedia

ErrorfulMultilingualMultimedia

Begin withEnglishTREC

Add SpanishTREC

Add EnglishSwitchboard& Spanish

CallHomereferencetranscripts

Replacereference

transcripts withthe output of

speech-to-text

QACTIS is being designed to process multilingual and multimedia data. The following table provides a flow of how we are proceeding to use this data.

9

Assembling Tools for QACTIS Since QACTIS will process multilingual and multimedia data, a number of tools are necessary, some of which would not be encounteredin single-language QA systems for text.

NLP/TextTools

MediaProcessing

Cross-lingualTools

MetadataGenerationCapability

POS taggers Parsers Stemmers Dictionaries Ontologies Entity taggers Gazateers IR capability

Speech-to-text Phonetic Recognition OCR

Cross-lingual dictionaries Parallel corpora

Speaker ID Language ID Gender ID Genre ID Script ID Rich transcript

tagging

10

Insights From Our Data Gathering• TREC-style questions assume the user wants a fact from a

large corpus. An analyst may want information about a given “document.” The answer may come from metadata.– Ex: Who is speaking in this file?

• Newswire typically provides the context necessary for Q&A. But with conversations, a system may need to know context. – Spanish Speaker A may say “Well, you know we’re all going to be

together -- that’s September 16th!”– Ex: understanding of Sept 16 comes from knowing the speakers are

from Mexico and Sept 16 is a Mexican Independence Day

• Answers may occur in data containing story threads – Need for multidocument reference resolution

• Analysts may need to ask follow-on questions.– See the next set of slides

11

Example of Multilingual, Multimedia Follow-on Questions

The following questions were made by a DoD analyst on the Spanish CallHome/Friend corpus:

[2] What kinds of studies has Vanessa participated in? [A: telephone conversations, smelling things, computer usage]

[1] ¿Qué ha hecho Vanessa Goddard para ganar dinero? (What has Vanessa Goddard done to earn money?) [A: Vanessa has participated in several studies]

The document with the answers came from sp_1555.txt (which is on the next page)

12

Example from CallFriend Spanish

Times Spanish CallFriend English Translation

0.35-5.48… A: Bueno, ya está grabando, mellamo ^Vanessa ^Goddard, y doypermiso para que se grabe estallamada.

A: OK, it’s already recording; myname is Vanessa Goddard and Igive permission to record this call.

19.00-20.99 B: ¿Para qué dices que es esteproyecto?

B: What do you say this project is for?

20.86-25.54…

A: Yo, ah, es que es, es unexperimento….

A: I, ah, it’s, that, it’s, it’s anexperiment….

33.06-39.11…

A: [breath] Y entonces te dan unallamada gratis, y después voy apedir una para, para ^Mexico^….

A: An then they give you a free call,and later I’m going to ask for one to,to Mexico….

37.92-39.5…

A: [breath] y te pagan diez dólares…. A: And they pay ten dollars….

13

Example from CallFriend Spanish

Times Spanish CallFriend English Translation

47.16-52.79 A: [breath] De hecho está re bien,oye, ya me estoy dando cuenta quepara hacer experimento [laughs] tepuedes ganar dinero.

A: It’s really great, listen, I’mrealizing that doing an experimentyou can earn money.

52.66-53.28 B: Sí. B: Yes.52.96-58.95…

A: Este, el, el, los, el, la semanapasada terminé uno, y me pagaroncuarenta dólares?

A: Uh, the, the, the, the, last week Idid one, and they paid me fortydollars.

58.5-59.98 B: ¿De g-, hacer qué? B: For, doing what?59.66-62.08 A: Por probar, este, sustancias… A: For tasting, uh, things.62.4-64.19 A: Mhm. Y nada más, este [breath] A: Mnn. An nothing more.64.51-70.31 A: Eh, pues el grado de irritación con

una computadora. %Ah, despuésotro también de olefacción.

A: Oh, then, the degree of irritationwith a computer. Ah, after,another also on smell.

14

Insights on Architecture Development Challenges for Multilingual Multimedia

Need to avoid hard coding The system itself should be language-independent and should rely on plug-ins. for importing language-specific features. If multiple languages are combined in the document collection, appropriate language ID tags need to be generated.

=>

Absence of multilingual tools Parsers, POS taggers, etc. are extensively available for English text. This is not usually the case for other languages or media.

We are using parallel projection as a means of generating new tools.

Retargeting of tools to multimedia Newswire is typically grammatical so NLP tools are generally reliable; it is unclear how these tools could be ported to other media .

15

Need to compensate for errors Newswire is formally composed. Conversations and other text genre may be more informal or freeform. This means that there will be multiple sources of error which must be compensated for, namely Parsing and other tool errors (before & after porting)

Machine translation errorsData conversion errors (e.g., speech-to-text):

Insights on Architecture Development(continued)

Potential use of metadata Speaker, language, script and other feature identifications can actually complement the Q&A process. For example, if you knew that Wolf Blitzer only covered White House news, you may be able to eliminate his commentaries if you were looking for sports.

16

Example 1: Retargeting Tools

• Experiment (UMd/LAMP Lab) - What degradation occurs when training parsers or noun-phrase detectors on WSJ and testing on Switchboard reference transcripts?

+ Parsing of Switchboard using WSJ-trained models results in close to double the number of errors:

Typical parsing accuracies of WSJ models applied to WSJ: 88% F-scoreSame models applied to Switchboard: 79.5% F-ScoreSame models applied to Switchboard w/o disfluencies: 79.0% F-Score

17

Example 2: Compensating for Errors

Q&A System uses outputs of constituent tools but builds only probabilistic (vs. unweighted) connections between entities, relationships, and propositions

Use of lattices rather than one-best output, for parsing and speech and image data conversions.

Using the Web as a sources of information for ranking noun phrase translations (UMd/LAMP Lab)

18

System Development to Date• Tools & data gathered/integrated into system flow with GUI:

– System incorporates BBN’s Identifinder, Lemur, Charniak & Collins parsers, TnT POS tagger, BBN Byblos Speech-to-Text, WordNet

• Data analysis and Question generation: – English text, CallFriend Spanish conversations, Spanish News Corpus.

• IR experiments:– Comparisons between internally-built, Okapi-based system & Lemur

– Non-trivial porting of Lemur to Spanish

• QA experiments in two separate directions: – Depth first: Exploring one question type in significant detail for accuracy

(e.g., a study of all “How many…?” type questions).

– Breadth first: Building an object-based system designed to tackle alltypes of questions for coverage.

19

System GUI

20

21

System GUI

22

23

June - December 2003• Ongoing Q&A prototype development• English and Spanish speech-to-text improvements• Lemur test on multilingual and reference and errorful

speech data• English and Spanish Q&A prototype for text• Trials of English and Spanish Q&A on

conversational speech• Document headline production for summaries (UMd)