November 15, 2003CLIS Alumni Chapter Talking to the Future: The MALACH Project Douglas W. Oard...

26
November 15, 200 3 CLIS Alumni Chapter Talking to the Future: The MALACH Project Douglas W. Oard Joanne Archer, Ammie Feijoo, Xiaoli Huang College of Information Studies

Transcript of November 15, 2003CLIS Alumni Chapter Talking to the Future: The MALACH Project Douglas W. Oard...

Page 1: November 15, 2003CLIS Alumni Chapter Talking to the Future: The MALACH Project Douglas W. Oard Joanne Archer, Ammie Feijoo, Xiaoli Huang College of Information.

November 15, 2003 CLIS Alumni Chapter

Talking to the Future:The MALACH Project

Douglas W. OardJoanne Archer, Ammie Feijoo, Xiaoli Huang

College of Information Studies

Page 2: November 15, 2003CLIS Alumni Chapter Talking to the Future: The MALACH Project Douglas W. Oard Joanne Archer, Ammie Feijoo, Xiaoli Huang College of Information.

Telling Our Stories

Page 3: November 15, 2003CLIS Alumni Chapter Talking to the Future: The MALACH Project Douglas W. Oard Joanne Archer, Ammie Feijoo, Xiaoli Huang College of Information.

Shoah Foundation’s Collection• Enormous scale

– 116,000 hours; 52,000 interviews; 180 TB

• Grand challenges– 32 languages, accents, elderly, emotional, …

• Accessible– $100 million collection and digitization investment

• Annotated– 10,000 hours (~200,000 segments) fully described

• Users– A department working full time on dissemination

Page 4: November 15, 2003CLIS Alumni Chapter Talking to the Future: The MALACH Project Douglas W. Oard Joanne Archer, Ammie Feijoo, Xiaoli Huang College of Information.

Who Uses the Collection?

• History• Linguistics• Journalism• Material culture• Education• Psychology• Political science• Law enforcement

• Book• Documentary film• Research paper• CDROM• Study guide• Obituary• Evidence• Personal use

Discipline Products

Based on analysis of 280 access requests

Page 5: November 15, 2003CLIS Alumni Chapter Talking to the Future: The MALACH Project Douglas W. Oard Joanne Archer, Ammie Feijoo, Xiaoli Huang College of Information.

Question Types

• Content– Person, organization– Place, type of place (e.g., camp, ghetto)– Time, time period– Event, subject

• Mode of expression– Language– Displayed artifacts (photographs, objects, …) – Affective reaction (e.g., vivid, moving, …)

• Age appropriateness

Page 6: November 15, 2003CLIS Alumni Chapter Talking to the Future: The MALACH Project Douglas W. Oard Joanne Archer, Ammie Feijoo, Xiaoli Huang College of Information.

Full-Description Cataloguing

Subject PersonLocation-Time

Berlin-1939 Employment Josef Stein

Berlin-1939 Family life Gretchen Stein Anna Stein

Dresden-1939 Schooling Gunter Wendt Maria

Dresden-1939 Relocation Transportation-rail inte

rvie

w ti

me

Page 7: November 15, 2003CLIS Alumni Chapter Talking to the Future: The MALACH Project Douglas W. Oard Joanne Archer, Ammie Feijoo, Xiaoli Huang College of Information.

“Real-Time” Cataloguing

Subject PersonLocation-Time

Berlin-1939

Dresden-1939

Employment Josef SteinGretchen SteinAnna Stein

RelocationTransportation-rail

SchoolingGunter Wendt

Family Life

Maria

inte

rvie

w ti

me

Page 8: November 15, 2003CLIS Alumni Chapter Talking to the Future: The MALACH Project Douglas W. Oard Joanne Archer, Ammie Feijoo, Xiaoli Huang College of Information.

Thesaurus-Based Search

Page 9: November 15, 2003CLIS Alumni Chapter Talking to the Future: The MALACH Project Douglas W. Oard Joanne Archer, Ammie Feijoo, Xiaoli Huang College of Information.

The Goal

Dramatically improve access to large multilingual spoken word Collections …

… by capitalizing on the unique characteristics of the Survivors of the Shoah Visual History Foundation's collection of videotaped oral history interviews.

Page 10: November 15, 2003CLIS Alumni Chapter Talking to the Future: The MALACH Project Douglas W. Oard Joanne Archer, Ammie Feijoo, Xiaoli Huang College of Information.

Joanne Archer

Page 11: November 15, 2003CLIS Alumni Chapter Talking to the Future: The MALACH Project Douglas W. Oard Joanne Archer, Ammie Feijoo, Xiaoli Huang College of Information.

Observational Studies

• Four searchers– History/Political Science– Holocaust studies– Holocaust studies– Documentary filmmaker

• Sequential observation• Rich data collection

– Intermediary interaction– Semi-structured interviews– Observational notes– Think-aloud– Screen capture

• Four searchers– Ethnography

– German Studies

– Sociology

– High school teacher

• Simultaneous observation

• Opportunistic data collection– Intermediary interaction

– Semi-structured interviews

– Observational notes

– Focus group discussions

Workshop 1 (June) Workshop 2 (August)

Page 12: November 15, 2003CLIS Alumni Chapter Talking to the Future: The MALACH Project Douglas W. Oard Joanne Archer, Ammie Feijoo, Xiaoli Huang College of Information.

Observed Selection Criteria

• Topicality (57%)Judged based on: Person, place, …

• Accessibility (23%)Judged based on: Time to load video

• Comprehensibility (14%)Judged based on: Language, speaking style

Page 13: November 15, 2003CLIS Alumni Chapter Talking to the Future: The MALACH Project Douglas W. Oard Joanne Archer, Ammie Feijoo, Xiaoli Huang College of Information.

FunctionalityNeeded Function Boolean Search and Ranked Retrieval (13)

Testimony summary (12)

Pre-Interview Questionnaire search/viewer (9)

Rapid access (7)

Related/Alternative search terms (3)

Adding multiple search terms at once (2)

Keywords linked to segment number for easy access(1)

Multi-tasking (1)

Searching testimonies by places under ‘Experience Search’ (1)

Extensive editing within ‘My Project’ (1)

Desired Function Temporary saving of selected testimonies (4)

Remote access (3)

Integrated user tools for note taking (3)

Map presentation (2)

Reference tool (1)

More repositories (1)

Introductory video of system tutorial (1)

Help (1)

Page 14: November 15, 2003CLIS Alumni Chapter Talking to the Future: The MALACH Project Douglas W. Oard Joanne Archer, Ammie Feijoo, Xiaoli Huang College of Information.

Xiaoli Huang

Page 15: November 15, 2003CLIS Alumni Chapter Talking to the Future: The MALACH Project Douglas W. Oard Joanne Archer, Ammie Feijoo, Xiaoli Huang College of Information.

Supporting Information Access

SourceSelection

Search

Query

Selection

Ranked List

Examination

Recording

Delivery

Recording

QueryFormulation

Search System

Query Reformulation and

Relevance Feedback

SourceReselection

Page 16: November 15, 2003CLIS Alumni Chapter Talking to the Future: The MALACH Project Douglas W. Oard Joanne Archer, Ammie Feijoo, Xiaoli Huang College of Information.

AutomaticSearch

BoundaryDetection

InteractiveSelection

ContentTagging

SpeechRecognition

QueryFormulation

ASR SpontaneousAccentedLanguage switching

NLPComponents Multi-scale segmentation

Multilingual classificationEntity normalization Prototype

Evidence integrationMultilingual searchSpatial/temporal

UserNeeds

Observational studiesFormative evaluationSummative evaluation

Page 17: November 15, 2003CLIS Alumni Chapter Talking to the Future: The MALACH Project Douglas W. Oard Joanne Archer, Ammie Feijoo, Xiaoli Huang College of Information.

Description Strategies• Transcription

– Manual transcription (with optional post-editing)

• Annotation– Manually assign descriptors to points in a recording– Recommender systems (ratings, link analysis, …)

• Associated materials– Interviewer’s notes, speech scripts, producer’s logs

• Automatic– Create access points with automatic speech processing

Page 18: November 15, 2003CLIS Alumni Chapter Talking to the Future: The MALACH Project Douglas W. Oard Joanne Archer, Ammie Feijoo, Xiaoli Huang College of Information.

English ASR Error Rate

0

20

40

60

80

100

Wo

rd E

rro

r R

ate

Training: 65 hours (acoustic model)/200 hours (language model)

Page 19: November 15, 2003CLIS Alumni Chapter Talking to the Future: The MALACH Project Douglas W. Oard Joanne Archer, Ammie Feijoo, Xiaoli Huang College of Information.

true

system output

missfalsealarm

Effect of ASR Errors

Page 20: November 15, 2003CLIS Alumni Chapter Talking to the Future: The MALACH Project Douglas W. Oard Joanne Archer, Ammie Feijoo, Xiaoli Huang College of Information.

Building a Test Collection

• Overall relevanceAssessment is informed by the assessments for the individual reasons for relevance (categories of relevance), but the relationship is not straightforward

• Provides direct evidence

• Provides indirect / circumstantial evidence

• Provides context(e.g., causes for the phenomenon of interest)

• Provides comparison (similarity or contrast, same phenomenon in different environment, similar phenomenon)

• Provides pointer to source of information

Page 21: November 15, 2003CLIS Alumni Chapter Talking to the Future: The MALACH Project Douglas W. Oard Joanne Archer, Ammie Feijoo, Xiaoli Huang College of Information.

Ammie Feijoo

Page 22: November 15, 2003CLIS Alumni Chapter Talking to the Future: The MALACH Project Douglas W. Oard Joanne Archer, Ammie Feijoo, Xiaoli Huang College of Information.

Some Statistics

• 2,000 U.S. radio stations Webcasting

• 250,000 hours of oral history in British Library

• 35,000,000 audio streams on the Web

Page 23: November 15, 2003CLIS Alumni Chapter Talking to the Future: The MALACH Project Douglas W. Oard Joanne Archer, Ammie Feijoo, Xiaoli Huang College of Information.

Spoken Word Collections

• Broadcast programming– News, interview, talk radio, sports, entertainment

• Scripted stories– Books on tape, poetry reading, theater

• Spontaneous storytelling– Oral history, folklore

• Incidental recording– Speeches, oral arguments, meetings, phone calls

Page 24: November 15, 2003CLIS Alumni Chapter Talking to the Future: The MALACH Project Douglas W. Oard Joanne Archer, Ammie Feijoo, Xiaoli Huang College of Information.

Building a Web of Spoken Words• Affordable storage

– For $1, you can store 1.5 million spoken words

• Adequate network capacity– Internet capacity: 30 million simultaneous programs

• Works with any modem– You can even read email while playing audio

• Replay capabilities– 38% of US users recently used streaming audio

• Effective search capabilities– Not quite yet …

Page 25: November 15, 2003CLIS Alumni Chapter Talking to the Future: The MALACH Project Douglas W. Oard Joanne Archer, Ammie Feijoo, Xiaoli Huang College of Information.

Looking Forward: 2006

• Working systems in five languages– Real users searching real data

• Rich experience beyond broadcast news– Frameworks, components, systems

• Affordable application-tuned systems– Oral history, lectures, speeches, meetings, …

Page 26: November 15, 2003CLIS Alumni Chapter Talking to the Future: The MALACH Project Douglas W. Oard Joanne Archer, Ammie Feijoo, Xiaoli Huang College of Information.

For More Information

• The MALACH project– http://www.clsp.jhu.edu/research/malach/

• NSF/EU Spoken Word Access Group– http://www.dcs.shef.ac.uk/spandh/projects/swag/

• Speech-based retrieval– http://www.glue.umd.edu/~dlrg/speech/