A Prototype Personal Dictation System Adam Janin janin@icsi.berkeley.edu.

Post on 15-Jan-2016

225 views 0 download

Transcript of A Prototype Personal Dictation System Adam Janin janin@icsi.berkeley.edu.

A Prototype Personal Dictation System

Adam Janinjanin@icsi.berkeley.edu

Final Goal – A Portable Meeting Recorder

Record impromptu meetings in a natural environment.

Detect multiple speakers.Allow correction and annotation.Support indexing and searching.Self-contained (using IRAM).

Intermediate Goal – A Personal Dictation System

Record a single user dictating text.Allow correction and editing.Hosted system:

ASR runs on workstation. GUI runs on Pilot. Communicate via wired network. Close-talking mic. Limited domain (Broadcast News).

Asides...

Why not Wizard of Oz? Structure of correction mechanism is

recognizer specific. Develop infrastructure. Produce a working demo.

Informal user study, mostly with speech researchers.

Architecture

Palm Pilot

Correct transcripts

Edit transcripts

Create new text

Sun Workstation

Audio frontend

Speech recognizer

Correction server

Correcting and Editing

Correcting – informing the recognizer that it has made an error. If recognizer has a good idea of alternatives,

it may be faster to correct than to edit. Recognizer can adapt to user and

vocabulary.

Editing – changing the output. “That’s not what I meant to say”. Text vs. speech input.

Correction Methods: Background

Lattice contains recognizer’s best guesses.

More compact than N-best lists.

Contains word order and timing.

1). the records …2). a rack ...3). the wreck or …4). a record ...

Correction Methods: Selecting Hypotheses

User corrects “records”.

1). the records …2). a rack ...3). the wreck or …4). a record ...

System picks all words that overlap in time.

Presents in order from most likely to least.

Note: full overlap is probably not optimal.

Correction Methods: Rescoring

User corrects “records” to “record”.

1). the records …2). a rack ...3). the wreck or …4). a record ...

Unexpected changes!

Select only paths with “record”.

Rescore lattice.

Editing

Allows user to add or edit text arbitrarily.

Must synchronize with correction server.

Edit vs. Correct is currently implemented modally with push buttons on-screen.

Gestural interface for correcting and editing would be preferable.

Details...

Correction allows for words not in lattice.

Tap to correct worked better than press-and-hold.

System updates text when user pauses.

Doesn’t handle punctuation, paragraphs, etc.

Correction is fast, but dictation is slow.

Future Work

“Real” user studies.Experiment more with correction

mechanisms.Implement editing synchronization.Implement gestures.Move to wireless network and mic.Add punctuation, paragraphs, etc.