Open Problems in Speech Recognition Nelson Morgan, EECS and ICSI.

Open Problems in Speech Recognition

Nelson Morgan, EECS and ICSI

ICSI and EECSICSI and EECS

•International Computer Science Institute

•Nonprofit, closely affiliated with UCB-EECS:

- faculty (e.g., Morgan, Feldman)- Board (Berlekamp, Karp, Malik)- students (PhD, MS)

• Focus areas in speech,language,theory, internet research; CITRIS involvement

A working speech A working speech recognizer (circa 1920)recognizer (circa 1920)

A working speech A working speech recognizer (circa 2002)recognizer (circa 2002)

Current ApplicationsCurrent Applications

•Toys

•Telephone queries (operator/touch tone replacement)

• Voice dialing (for cell phones)

• Dictation (esp. for specific domains)

Major Reasons for Major Reasons for SuccessSuccess

• Late 60’s statistical methodology (HMMs, developed for cryptography) applied to speech in 70’s and 80’s

• Moore’s Law + engineering refinements to HMM training/recognition (1986-now)

• Normalization approaches (mean norms, RASTA filtering, vocal tract length approx)

Two examples of things Two examples of things that helpedthat helped

• RASTA: 2% digit error -> 60% for different phone system; down to 3% using RASTA; now used for voice dialing in millions of cell phones

• Vocal tract length normalization: 1 parameter for each speaker, significant effect on errors; now used in all large research systems

Major Technical Major Technical ChallengesChallenges

•Speaker variability for fluent/conversational (pronunciation, rate, overlaps)

25-40%error on conversations

•Acoustic variability for general environments (noise, reverb, talker movement) 3-10%error on read digits (vs <1% in clean conditions)

Modern ASR SystemsModern ASR Systems

• From 50,000 ft, all ASR systems the same:

- compute local spectral envelope- determine likelihoods of speech

sounds- search for most likely HMMs

• Spectral envelope distorted by many things

- Alternatives often are bad fits to the statistical models

Pronunciation Lexicon

Signal Processing

PhoneticProbabilityEstimator

Decoder(word search)

WordsSpeech

Grammar

ASR in BriefASR in Brief

ASR is half-deafASR is half-deaf

• Phonetic classification very poor

• Success due to constraints (domain, speaker, noise-canceling mic, etc)

• These constraints can mask the underlying weakness of the technology

Rethinking Acoustic Rethinking Acoustic Processing for ASRProcessing for ASR

• Escape dependence on spectral envelope

• Use multiple front ends across time/freq

• Modify statistical models to accommodate new front ends

• Design optimal combination schemes for multiple models

The DARPA (IAO) The DARPA (IAO) “EARS” Program“EARS” Program

• New 5 year program to radically reduce errors in conversational speech-to-text

• Two components: - Rich Transcription (large reductions

in error rate, improvements in readability and portability to new languages)

- Novel Approaches (radical changes)

EARS: Effective Affordable EARS: Effective Affordable Reusable Speech-to-textReusable Speech-to-text

• Rich Transcription: 4 teams- SRI/ICSI/UW- BBN/U.Pitt/UW/LIMSI- Cambridge U.- IBM

• Novel Approaches: 2 teams- ICSI/SRI/UW/OGI/Columbia/IDIAP- Microsoft

Novel Approach 1: Novel Approach 1: Pushing the Envelope Pushing the Envelope

(aside)(aside)

• Problem: Spectral envelope is a fragile information carrier

estimate of sound identity

10 msOLD

PROPOSED

• Solution: Probabilities from multiple time-frequency patches

i-th estimate

up to 1s

k-th estimate

n-th estimate

estimate of sound identity

Novel Approach 2: Novel Approach 2: Beyond Frames…Beyond Frames…

• Solution: Advanced features require advanced models, not limited by fixed-frame-rate paradigm

PROPOSED

conventional HMMshort-term features

• Problem: Features & models interact, new features may require different models

advanced features multi-rate / dynamic scale classifier

Other speech-to-text Other speech-to-text projectsprojects

• Dialog systems: DARPA Communicator/Symphony, German SmartKom

• Noise/reverberation for cell phone, military environments: DARPA SPINE program, various European projects (EU, ETSI)

• Recognition/retrieval/summarization for multiparty meetings: Swiss IM2, EU m4, ICSI/UW/SRI/Columbia NSF-ITR

Resource generation Resource generation from Berkeley from Berkeley researchersresearchers

• gmtk - a new graphical model toolkit specialized for speech (extension of 2 PhD theses, Bilmes [UW] and Zweig [IBM]) -

• Publicly available speech/neural network software (RASTA, speech neural network training system)

• Soon: a “meeting data” corpus

Campus interactionCampus interaction

• Within EECS (CIS):- Feldman (also ICSI), NLU- Jordan and Russell, machine

learning

• Linguists:- Ohala, phonology- Fillmore(ICSI), semantic

lexicography

Natural Speech + Natural Speech + Language Projects at Language Projects at

ICSI/EECSICSI/EECS• Berkeley Restaurant Project (BeRP) - online stochastic context free grammar probabilities with natural mixed initiative

• SmartKom - tourist information query system w/American pronunciations of German place names

SummarySummary

• Progress in speech recognition research led to working systems in particular domains

• Performance still severely limited for conversational speech, noisy/reverberant conditions

• We and others are working to transcend these limitations with novel approaches

Open Problems in Speech Recognition Nelson Morgan, EECS and ICSI.

Documents

Transcript of Open Problems in Speech Recognition Nelson Morgan, EECS and ICSI.

mlj - ICSI | ICSI

ICSI Incorrect

Virtual Platform th & 16th - ICSI · 2020. 10. 13. · Amrita Nautiyal Program ... Secretary, ICSI - WIRC CS Pawan Chandak Vice-Chairman, ICSI - WIRC CS Snehal Shah Treasurer, ICSI

BACKGROUNDER - ICSI

ICSI Not ICSI - ASRM · Palermo et al., 2015 Semin Reprod Med ICSI Indications Ejaculated Oligozoospermia Asthenozoospermia Teratozoospermia Antispermantibodies Fertility preservation

Latent Task Adaptation with Large-Scale Hierarchies · 2013. 12. 3. · Latent Task Adaptation with Large-scale Hierarchies Yangqing Jia Trevor Darrell UC Berkeley EECS & ICSI {jiayq,

By the Novel Approaches team, With site leaders: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sönmez, SRI Mari Ostendorf, UW Hervé.

ICSI VISION 2020

Optimization Models - EECS 127 / EECS 227AT · 2020. 4. 30. · Optimization Models EECS 127 / EECS 227AT Laurent El Ghaoui EECS department UC Berkeley Spring 2020 Sp20 1/30

Sign-Based Construction Grammar - ICSI | ICSI

UserManualfor“SwitchOver&RevertSwitch Over” · ICSI SMASH ICSI 2 TableofContents ... ICSI SMASH ICSI 3 Applyfor Switch Over From Old Syllabus toNew Syllabus (StudentPart) ...

ICSI - Home

ICSI France

What Is the Sapir-Whorf Hypothesis? - ICSI | ICSI

Speaker Recognition Systems - ICSI | ICSI

TEKNIK ICSI

Optimization Models - EECS 127 / EECS 227ATinst.eecs.berkeley.edu/~ee127/sp20/student... · Optimization Models EECS 127 / EECS 227AT Laurent El Ghaoui EECS department UC Berkeley

ICSI - Home€¦ · ICSI - Home

ICSI magazine

ICSI HQ: ICSI HOUSE, 22, INSTITUTIONAL AREA, LODI ROAD ... · ICSI NOIDA OFFICE: ICSI House, C-36, SECTOR-62, NOIDA-201309 (UP) Tender No. ICSI/E-Book/2020/11 August 19, 2020 Sub: