Speech-to-Speech Translation: A New Direction for the Speech Industry SpeechTEK West February 21-23,...

20
Speech-to-Speech Translation: A New Direction for the Speech Industry SpeechTEK West February 21-23, 2007 Mark Seligman, CEO

Transcript of Speech-to-Speech Translation: A New Direction for the Speech Industry SpeechTEK West February 21-23,...

Speech-to-Speech Translation: A New Direction for the Speech Industry

SpeechTEK West February 21-23, 2007

Mark Seligman, CEO

Converser for Healthcare is the world’s first commercially available speech-to-speech translation system for wide-ranging conversations. (Input via handwriting, touchscreen, and keyboard is also enabled.)

Converser for Healthcare is an affordable, reliable, portable translation system which can improve communication 24/7 between healthcare workers and patients with limited English proficiency.

Overview• Automatic Spoken Language Translation (SLT)

– an age-old dream• Practical SLT systems are now coming into use

– … but users must cooperate and compromise • History: three classes of SLT systems

– categorized by degree of user cooperation and linguistic or topical coverage

• Demo • Market• Commercial and research activity

Star Trek? Not!

• The goal: speak as usual – freely shift topics – full range of vocabulary, idioms, structures– spontaneous language: fragments, false starts,

hesitations– mumble– converse in noisy environments– ignore the translation program

• For now: some cooperation, compromise

The scientific problem: component integration

• Component technologies (SR, MT, TTS) – imperfect, hard to integrate

• Each is usable, but combination may fall below usefulness threshold

– error rates combine, compound

Class One

• Class One: voice-driven phrase book– linguistic coverage: narrow

– topical coverage: narrow

– cooperation required: low

• Fixed expressions or templates only – “I’d like a bottle of [beer, wine, soda], please.”

– “I’d like a bottle of [BEVERAGE], please.”

• Advantages for user– no need to carry a book -- use telephone

– selection of phrase by voice rather than finger

– translation output pronounced by native

• Technology– Speech recognition: IVR

– MT: flat lookup, template or example-based

– Engineering exercise: low risk

Phraselator by VoxTec

Other Class One

• Sony TalkMan• Pending entries

– Sharp– NEC

• Future IVR systems?

Class Two

• Class Two: robust speech translation in narrow domains – linguistic coverage: broad– topical coverage: narrow– cooperation required: medium

• Examples– Uh, could I reserve a double room for next Tuesday, please?– I need to, um, I need a double room please. That’s for next

Tuesday.– Hello, I’m calling about reserving a room. I’d be arriving next week

on Tuesday.• Advantages

– Lots of experience– Can optimize SR, MT: special grammars (patterns)– Interlingua possible for MT

• Challenges– Robust parsing still imperfect, so MT input is dirty– Some user frustration inevitable, but balanced by freedom– Risk: medium

Class two: Worldwide Research

• CMU/Univ Karlsruhe (USA/Germany)• ATR (Japan)• IRST (Italy)• ETRI (Korea)• GETA-CLIPS (France)• CAS-NLPR (China)• IBM (USA)

Class two: Research

SYSTEM DEVELOPER TIME DOMAINS LANGUAGES MT VOCAB

Head Transducers

AT&T Labs (USA)

1996 Travel information

accessing

English-Chinese / English-Spanish

Statistical 1200/1300

JANUS-III CMU (USA) 1997-

Hotel reservation , flight / train ticket booking , etc.

English-German, Japanese, Spanish, etc.

Multi-engine open

ATR-MATRIXATR-SLT

(Japan)1998- 2001

Hotel reservation

Japanese-English 、German etc.

Pattern-based

2000

Verbmobil

Univ. of Karlsruhe, DFKI etc. (Ger.)

1993-2000

Meeting appointmentGerman,

English, Japanese

Multi-engine 10000/2500

LodestarCAS-NLPR

(China)1999

Hotel reservation, travel information accessing

Chinese-Japanese, English

Multi-engine 2000

Class Three • Class three: highly interactive speech translation

with broad linguistic and topical coverage – linguistic coverage: broad– topical coverage: broad– cooperation required: extensive

• User achieves broad coverage by supervising• SR: need dictation for broad coverage• MT: need broad coverage, good quality

– Must be modifiable to enable interactive correction

In the beginning …

French: Qu’est-ce que vous étudiez? (What do you study?)

English: Computer science.(L’informatique.)

French: Qu'est-ce que vous faites plus tard? (What are you doing later?)

English: I'm going skiing.(Je vais faire du ski.)

French: Vous n'avez pas besoin de travailler? (You don't need to work?)

English: I'll take my computer with me.(Je prendrai mon ordinateur avec moi.)

French: Où est-ce que vous mettrez l'ordinateur pendant que vous skiez?(Where will you put the computer while you ski?)

English: In my pocket.(Dans ma poche.)

Converser Features

Demo

Market: U.S. Healthcare

• 200,000 potential customers• Healthcare venues

• 6,003 hospitals (2003 www.USNews.com)

• 836,156 physicians (2001 www.ama.com)

• 15-20 minutes/meeting

• $45-$150/hour for human interpreter

Value PropositionOperational

– significant ROI – 24/7 access to interpreting – reduced patient waiting time– more efficient use of employees (keep staff in

their positions)– patient SAFETY (real and perceived)– reduced liability: bilingual transcripts of

interaction with patients– compliance

Communication benefits– privacy – more verifiability, consistency than with

human interpreter– Informed consent

Worldwide Market

• IDC– Cross-language software:

• $67 billion (2000) to $237 billion (2005) – Worldwide e-business globalization support:

• > $540 billion – Multilingual communications, collaboration tools:

• $5 billion (by 2008)

• Allied Business Intelligence, Inc.– Worldwide human translation:

• $5.7 billion (in 2006)

• Global Reach• 70%+ of online population not native English

Markets

• Defense and Security– services, intelligence, allies– law enforcement

• Travel and Tourism• Language Instruction/Education• Government Service

– immigration– welfare, food stamps, etc.

• Business – B2C: customer service – B2B: multinational firms, global

partners/operations

• Consumer– online affinity/personal portals

(e.g. online dating)

Some Current Research/Commercial Activity

• Spoken Translation, Inc. (Converser)• IBM (Mastor)• Sehda (S-Minds)• SpeechGear (Compadre Interpreter)• VoxTec (Phraselator)• Sony/Sharp/NEC (tourist)• Ectaco (Dictionary +)• MIT (flight domain)• CMU (Arabic for military)• BBN (Arabic for military)

Thank you!

To view demo visit:

www.ConverserforHealthcare.com