Language Modeling for Automatic Speech Recognition Meets the … · 2018-04-06 · Language...

37
Language Modeling for Automatic Speech Recognition Meets the Web: Google Search by Voice Ciprian Chelba, Johan Schalkwyk, Boulos Harb, Carolina Parada, Cyril Allauzen, Leif Johnson, Michael Riley, Peng Xu, Preethi Jyothi, Thorsten Brants, Vida Ha, Will Neveitt 02/03/2012 Ciprian Chelba et al., Voice Search Language Modeling – p. 1

Transcript of Language Modeling for Automatic Speech Recognition Meets the … · 2018-04-06 · Language...

Page 1: Language Modeling for Automatic Speech Recognition Meets the … · 2018-04-06 · Language Modeling for Automatic Speech Recognition Meets the Web: Google Search by Voice Ciprian

Language Modeling for Automatic SpeechRecognition Meets the Web:

Google Search by VoiceCiprian Chelba, Johan Schalkwyk, Boulos Harb, Carolina Parada, Cyril Allauzen, Leif

Johnson, Michael Riley, Peng Xu, Preethi Jyothi, Thorsten Brants, Vida Ha, Will Neveitt

02/03/2012 Ciprian Chelba et al., Voice Search Language Modeling – p. 1

Page 2: Language Modeling for Automatic Speech Recognition Meets the … · 2018-04-06 · Language Modeling for Automatic Speech Recognition Meets the Web: Google Search by Voice Ciprian

Statistical Modeling in AutomaticSpeech Recognition

Producer

Speaker’s Speech

Speech Recognizer

Acoustic

Processor

Linguistic

DecoderW WA

Speaker

Mind

Acoustic Channel

Speech

W = argmaxWP (W |A) = argmaxWP (A|W ) · P (W )

P (A|W ) acoustic model (Hidden Markov Model)

P (W ) language model (Markov chain)

search for the most likely word string W

due to the large vocabulary size—1M words—anexhaustive search is intractable

02/03/2012 Ciprian Chelba et al., Voice Search Language Modeling – p. 2

Page 3: Language Modeling for Automatic Speech Recognition Meets the … · 2018-04-06 · Language Modeling for Automatic Speech Recognition Meets the Web: Google Search by Voice Ciprian

Language Model Evaluation (1)

Word Error Rate (WER)TRN: UP UPSTATE NEW YORK SOMEWHERE UH OVERHYP: UPSTATE NEW YORK SOMEWHERE UH ALL ALL

D 0 0 0 0 0 I S:3 errors/7 words in transcript; WER = 43%

Perplexity(PPL)

PPL(M) = exp(

− 1N

N

i=1 ln [PM (wi|w1 . . . wi−1)])

good models are smooth: PM(wi|w1 . . . wi−1) > ǫ

other metrics: out-of-vocabulary rate/n-gram hit ratios

02/03/2012 Ciprian Chelba et al., Voice Search Language Modeling – p. 3

Page 4: Language Modeling for Automatic Speech Recognition Meets the … · 2018-04-06 · Language Modeling for Automatic Speech Recognition Meets the Web: Google Search by Voice Ciprian

Language Model Evaluation (2)

Web Score (WebScore)TRN: TAI PAN RESTAURANT PALO ALTOHYP: TAIPAN RESTAURANTS PALO ALTO

produce the same search results

do not count as error if top search result is identicalwith that for the manually transcribed query

02/03/2012 Ciprian Chelba et al., Voice Search Language Modeling – p. 4

Page 5: Language Modeling for Automatic Speech Recognition Meets the … · 2018-04-06 · Language Modeling for Automatic Speech Recognition Meets the Web: Google Search by Voice Ciprian

Language Model Smoothing

Markov assumption:

Pθ(wi/w1 . . . wi−1), θ ∈ Θ, wi ∈ V

Smoothing using Deleted Interpolation:

Pn(w|h) = λ(h) · Pn−1(w|h′) + (1− λ(h)) · fn(w|h)

P−1(w) = uniform(V)

Parameters (smoothing weights λ(h) must be estimated oncross-validation data):

θ = {λ(h); count(w|h),∀(w|h) ∈ T }

02/03/2012 Ciprian Chelba et al., Voice Search Language Modeling – p. 5

Page 6: Language Modeling for Automatic Speech Recognition Meets the … · 2018-04-06 · Language Modeling for Automatic Speech Recognition Meets the Web: Google Search by Voice Ciprian

Voice Search LM Training Setup

correcta google.com queries, normalized for ASR, e.g.5th -> fifth

vocabulary size: 1M words, OoV rate 0.57% (!),excellent n-gram hit ratios

training data: 230B words

Order no. n-grams pruning PPL n-gram hit-ratios3 15M entropy 190 47/93/1003 7.7B none 132 97/99/1005 12.7B 1-1-2-2-2 108 77/88/97/99/100

aThanks Mark Paskin

02/03/2012 Ciprian Chelba et al., Voice Search Language Modeling – p. 6

Page 7: Language Modeling for Automatic Speech Recognition Meets the … · 2018-04-06 · Language Modeling for Automatic Speech Recognition Meets the Web: Google Search by Voice Ciprian

Distributed LM Training

Input: key=ID,value=sentence/doc

Intermediate:key=word, value=1

Output: key=word,value=count

Map chooses re-duce shard basedon hash value (redor bleu) a

aT. Brants et al., Large Language Models in Machine Translation

02/03/2012 Ciprian Chelba et al., Voice Search Language Modeling – p. 7

Page 8: Language Modeling for Automatic Speech Recognition Meets the … · 2018-04-06 · Language Modeling for Automatic Speech Recognition Meets the Web: Google Search by Voice Ciprian

Using Distributed LMs

load each shard into the memory of one machine

Bottleneck: in-memory/network access at X-hundrednanoseconds/Y milliseconds (factor 10,000)

Example: translation of one sentence

approx. 100k n-grams; 100k * 7ms = 700 seconds persentence

Solution: batched processing

25 batches, 4k n-grams each: less than 1 second a

aT. Brants et al., Large Language Models in Machine Translation

02/03/2012 Ciprian Chelba et al., Voice Search Language Modeling – p. 8

Page 9: Language Modeling for Automatic Speech Recognition Meets the … · 2018-04-06 · Language Modeling for Automatic Speech Recognition Meets the Web: Google Search by Voice Ciprian

ASR Decoding Interface

First pass LM: finite state machine (FSM) API

states: n-gram contexts

arcs: for each state/context, list each n-gram in the LM+ back-off transition

trouble: need all n-grams in RAM (tens of billions)

Second pass LM: lattice rescoring

states: n-gram contexts, after expansion to rescoringLM order

arcs: {new states} X {no. arcs in original lattice}

good: distributed LM and large batch RPC02/03/2012 Ciprian Chelba et al., Voice Search Language Modeling – p. 9

Page 10: Language Modeling for Automatic Speech Recognition Meets the … · 2018-04-06 · Language Modeling for Automatic Speech Recognition Meets the Web: Google Search by Voice Ciprian

Language Model Pruning

Entropy pruning is required for use in 1st pass:

should one remove n-gram (h,w)?

D[q(h)p(·|h) ‖ q(h) · p′(·|h)] = q(h)∑

w

p(w|h) logp(w|h)

p′(w|h)

| D[q(h)p(·|h) ‖ q(h) · p′(·|h)] | < pruning threshold

lower order estimates: q(h) = p(h1) . . . p(hn|h1...hn−1)or relative frequency: q(h) = f(h)

very effective in reducing LM size at min cost in PPL

02/03/2012 Ciprian Chelba et al., Voice Search Language Modeling – p. 10

Page 11: Language Modeling for Automatic Speech Recognition Meets the … · 2018-04-06 · Language Modeling for Automatic Speech Recognition Meets the Web: Google Search by Voice Ciprian

On Smoothing and Pruning (1)

4-gram model trained on 100Mwds, 100k vocabulary,pruned to 1% of raw size using SRILM

tested on 690k wds

4-gram PerplexityLM smoothing raw prunedNey 120.5 197.3Ney, Interpolated 119.8 198.1Witten-Bell 118.8 196.3Witten-Bell, Interpolated 121.6 202.3Ristad 126.4 203.6Katz (Good-Turing) 119.8 198.1Kneser-Ney 114.5 285.1Kneser-Ney, Interpolated 115.8 274.3Kneser-Ney (CG) 116.3 280.6Kneser-Ney (CG, Interpolated) 115.8 274.3

02/03/2012 Ciprian Chelba et al., Voice Search Language Modeling – p. 11

Page 12: Language Modeling for Automatic Speech Recognition Meets the … · 2018-04-06 · Language Modeling for Automatic Speech Recognition Meets the Web: Google Search by Voice Ciprian

On Smoothing and Pruning (2)

18 19 20 21 22 23 24 256.8

7

7.2

7.4

7.6

7.8

8

8.2

8.4

Model Size in Number of N−grams (log2)

PP

L (lo

g2)

Perplexity Increase with Pruned LM Size

Katz (Good−Turing)Kneser−NeyInterpolated Kneser−Ney

baseline LM is pruned to 0.1% of raw size!

switch from KN to Katz smoothing: 10% WER gain

02/03/2012 Ciprian Chelba et al., Voice Search Language Modeling – p. 12

Page 13: Language Modeling for Automatic Speech Recognition Meets the … · 2018-04-06 · Language Modeling for Automatic Speech Recognition Meets the Web: Google Search by Voice Ciprian

Billion n-gram 1st Pass LM (1)

LM representation rateCompression Block Rel. Rep. RateTechnique Length Time (B/n-gram)None — 1.0 13.2Quantized — 1.0 8.1CMU 24b, Quantized — 1.0 5.8GroupVar 8 1.4 6.3

64 1.9 4.8256 3.4 4.6

RandomAccess 8 1.5 6.264 1.8 4.6

256 3.0 4.6CompressedArray 8 2.3 5.0

64 5.6 3.2256 16.4 3.1

02/03/2012 Ciprian Chelba et al., Voice Search Language Modeling – p. 13

Page 14: Language Modeling for Automatic Speech Recognition Meets the … · 2018-04-06 · Language Modeling for Automatic Speech Recognition Meets the Web: Google Search by Voice Ciprian

Billion n-gram 1st Pass LM (2)

0 1 2 3 4 5 6 7 8 9 103

4

5

6

7

8

9

Time, Relative to Uncompressed

Rep

rese

ntat

ion

Rat

e (B

/−ng

ram

)

Google Search by Voice LM

GroupVarRandomAccessCompressedArray

1B 3-grams: 5GB of RAM @acceptable lookup speeda

aB. Harb, C. Chelba, J. Dean and S. Ghemawat, Back-Off Language Model

Compression, Interspeech 200902/03/2012 Ciprian Chelba et al., Voice Search Language Modeling – p. 14

Page 15: Language Modeling for Automatic Speech Recognition Meets the … · 2018-04-06 · Language Modeling for Automatic Speech Recognition Meets the Web: Google Search by Voice Ciprian

Is Bigger Better? YES!

10−3

10−2

10−1

100

101

16

18

20

22

LM size: # n−grams(B, log scale)

Word Error Rate (left) and WebScore Error Rate (100%−WebScore, right) as a function of LM size

10−3

10−2

10−1

100

10124

26

28

30

8%/10% relative gain in WER/WebScorea

aWith Cyril Allauzen, Johan Schalkwyk, Mike Riley, May reachable composi-

tion CLoG be with you!02/03/2012 Ciprian Chelba et al., Voice Search Language Modeling – p. 15

Page 16: Language Modeling for Automatic Speech Recognition Meets the … · 2018-04-06 · Language Modeling for Automatic Speech Recognition Meets the Web: Google Search by Voice Ciprian

Is Bigger Better? YES!

10−3

10−2

10−1

100

101

120

140

160

180

200

220

240

260

LM size: # n−grams(B, log scale)

Perplexity (left) and Word Error Rate (right) as a function of LM size

10−3

10−2

10−1

100

10117

17.5

18

18.5

19

19.5

20

20.5

PPL is really well correlated with WER!

02/03/2012 Ciprian Chelba et al., Voice Search Language Modeling – p. 16

Page 17: Language Modeling for Automatic Speech Recognition Meets the … · 2018-04-06 · Language Modeling for Automatic Speech Recognition Meets the Web: Google Search by Voice Ciprian

Is Even Bigger Better? YES!

10−2

10−1

100

101

16

18

20

LM size: # 5−grams(B)

WER (left) and WebError (100−WebScore, right) as a function of 5−gram LM size

10−2

10−1

100

10124

26

28

5-gram: 11% relative in WER/WebScore02/03/2012 Ciprian Chelba et al., Voice Search Language Modeling – p. 17

Page 18: Language Modeling for Automatic Speech Recognition Meets the … · 2018-04-06 · Language Modeling for Automatic Speech Recognition Meets the Web: Google Search by Voice Ciprian

Is Even Bigger Better? YES!

10−2

10−1

100

101

100

120

140

160

180

200

LM size: # 5−grams(B)

Perplexity (left) and WER (right) as a function of 5−gram LM size

10−2

10−1

100

10116.5

17

17.5

18

18.5

19

Again, PPL is really well correlated with WER!

02/03/2012 Ciprian Chelba et al., Voice Search Language Modeling – p. 18

Page 19: Language Modeling for Automatic Speech Recognition Meets the … · 2018-04-06 · Language Modeling for Automatic Speech Recognition Meets the Web: Google Search by Voice Ciprian

Detour: Search vs. Modeling error

W = argmaxWP (A,W |θ)

If correct W ∗ 6= W we have an error:

P (A,W ∗|θ) > P (A, W |θ): search error

P (A,W ∗|θ) < P (A, W |θ): modeling error

wisdom has it that in ASRsearch error < modeling error

Corollary: improvements come primarily from usingbetter models, integration in decoder/search is secondorder!

02/03/2012 Ciprian Chelba et al., Voice Search Language Modeling – p. 19

Page 20: Language Modeling for Automatic Speech Recognition Meets the … · 2018-04-06 · Language Modeling for Automatic Speech Recognition Meets the Web: Google Search by Voice Ciprian

Lattice LM Rescoring

Pass Language Model PPL WER WebScore1st 15M 3g 191 18.7 72.21st 1.6B 5g 112 16.9 75.22nd 15M 3g 191 18.8 72.62nd 1.6B 3g 112 16.9 75.32nd 12B 5g 108 16.8 75.4

10% relative reduction in remaining WER, WebScoreerror

1st pass gains matched in ProdLm lattice rescoringa atnegligible impact in real-time factor

aOlder front end, 0.2% WER diff

02/03/2012 Ciprian Chelba et al., Voice Search Language Modeling – p. 20

Page 21: Language Modeling for Automatic Speech Recognition Meets the … · 2018-04-06 · Language Modeling for Automatic Speech Recognition Meets the Web: Google Search by Voice Ciprian

Lattice Depth Effect on LM Rescoring

100

101

102

103

0

5x 10

5

Per

plex

ity

Lattice Density (# links per transcribed word)

Perplexity (left) and WER (right) as a function of lattice depth

100

101

102

10345

50

Wor

d E

rror

Rat

e

LM becomes ineffective after a certain lattice depth

02/03/2012 Ciprian Chelba et al., Voice Search Language Modeling – p. 21

Page 22: Language Modeling for Automatic Speech Recognition Meets the … · 2018-04-06 · Language Modeling for Automatic Speech Recognition Meets the Web: Google Search by Voice Ciprian

N-best Rescoring

N-best rescoring experimental setup

minimal coding effort for testing LMs: all you need todo is assign a score to a sentence

Experiment LM WER WebScoreSpokenLM baseline 13M 3g 17.5 73.3lattice rescoring 12B 5g 16.1 76.310-best rescoring 1.6B 5g 16.4 75.2

a good LM will immediately show its potential, even onas little as 10-best alternates rescoring!

02/03/2012 Ciprian Chelba et al., Voice Search Language Modeling – p. 22

Page 23: Language Modeling for Automatic Speech Recognition Meets the … · 2018-04-06 · Language Modeling for Automatic Speech Recognition Meets the Web: Google Search by Voice Ciprian

Query Stream Non-stationarity (1)

USA training dataa:XX monthsX months

test data: 10k, Sept-Dec 2008b

very little impact in OoV rate for 1M wds vocabulary:0.77% (X months vocabulary) vs. 0.73% (XX monthsvocabulary)

aThanks Mark PaskinbThanks Zhongli Ding for query selection.

02/03/2012 Ciprian Chelba et al., Voice Search Language Modeling – p. 23

Page 24: Language Modeling for Automatic Speech Recognition Meets the … · 2018-04-06 · Language Modeling for Automatic Speech Recognition Meets the Web: Google Search by Voice Ciprian

Query Stream Non-stationarity (2)

3-gram LM Training Set Test Set PPLunpruned X months 121unpruned XX months 132entropy pruned X months 205entropy pruned XX months 209

bigger is not always bettera

10% rel reduction in PPL when using the most recentX months instead of XX months

no significant difference after pruning, in either PPL orWER

aThe vocabularies are mismatched, so the PPL comparison is a bit trouble-

some. The difference would be higher if we used a fixed vocabulary.

02/03/2012 Ciprian Chelba et al., Voice Search Language Modeling – p. 24

Page 25: Language Modeling for Automatic Speech Recognition Meets the … · 2018-04-06 · Language Modeling for Automatic Speech Recognition Meets the Web: Google Search by Voice Ciprian

More Locales

training data across 3 localesa: USA, GBR, AUS,spanning same amount of time ending in Aug 2008

test data: 10k/locale, Sept-Dec 2008

Out of Vocabulary Rate:Training Test LocaleLocale USA GBR AUSUSA 0.7 1.3 1.6GBR 1.3 0.7 1.3AUS 1.3 1.1 0.7

locale specific vocabulary halves the OoV rateaThanks Mark Paskin

02/03/2012 Ciprian Chelba et al., Voice Search Language Modeling – p. 25

Page 26: Language Modeling for Automatic Speech Recognition Meets the … · 2018-04-06 · Language Modeling for Automatic Speech Recognition Meets the Web: Google Search by Voice Ciprian

Locale Matters (2)

Perplexity of unpruned LM:Training Test LocaleLocale USA GBR AUSUSA 132 234 251GBR 260 110 224AUS 276 210 124

locale specific LM halves the PPL of the unpruned LM

02/03/2012 Ciprian Chelba et al., Voice Search Language Modeling – p. 26

Page 27: Language Modeling for Automatic Speech Recognition Meets the … · 2018-04-06 · Language Modeling for Automatic Speech Recognition Meets the Web: Google Search by Voice Ciprian

Locale Matters (3)

Perplexity of pruned LM:Training Test LocaleLocale USA GBR AUSUSA 210 369 412GBR 442 150 342AUS 422 293 171

locale specific LM halves the PPL of the pruned LM aswell

02/03/2012 Ciprian Chelba et al., Voice Search Language Modeling – p. 27

Page 28: Language Modeling for Automatic Speech Recognition Meets the … · 2018-04-06 · Language Modeling for Automatic Speech Recognition Meets the Web: Google Search by Voice Ciprian

Discriminative Language Modeling

ML estimate from correct text is of limited use in decoding:back-off n-gram assigns −logP(“a navigate to”) = 0.266

need parallel data (A,W ∗)

significant amount can be mined from voice searchlogs using confidence filtering

first-pass scores discriminate perfectly, nothing tolearn? a

aWork with Preethi Jyothi, Leif Johnson, Brian Strope, [ICASSP ’12,

to be published

02/03/2012 Ciprian Chelba et al., Voice Search Language Modeling – p. 28

Page 29: Language Modeling for Automatic Speech Recognition Meets the … · 2018-04-06 · Language Modeling for Automatic Speech Recognition Meets the Web: Google Search by Voice Ciprian

Experimental Setup

confidence filtering on baseline AM/LM to givereference transcriptions (≈ manually transcribed data)

weaker AM (ML-trained, single mixture gaussians) togenerate N-best and ensure sufficient errors to trainthe DLMs

largest models are trained on ∼80,000 hours of speech(re-decoding is expensive!), ∼350 million words

different from previous work [Roark et al.,ACL’04] where they cross-validate the baseline LMtraining to generalize better to unseen data

02/03/2012 Ciprian Chelba et al., Voice Search Language Modeling – p. 29

Page 30: Language Modeling for Automatic Speech Recognition Meets the … · 2018-04-06 · Language Modeling for Automatic Speech Recognition Meets the Web: Google Search by Voice Ciprian

N-best Reranking Oracle Error Rateson weakAM-dev/T9b

0 50 100 150 200

10

20

30

40

50

Figure 1: Oracle error rates upto N=200

N

Err

or

Ra

te

weakAM!dev SER

weakAM!dev WER

T9b SER

T9b WER

02/03/2012 Ciprian Chelba et al., Voice Search Language Modeling – p. 30

Page 31: Language Modeling for Automatic Speech Recognition Meets the … · 2018-04-06 · Language Modeling for Automatic Speech Recognition Meets the Web: Google Search by Voice Ciprian

DLM at Scale: Distributed Perceptron

Features: 1st pass lattice costs and ngram word features,[Roark et al.,ACL ’04].Rerankers: Parameter weights at iteration t+ 1, wt+1 forreranker models trained on N utterances.

Perceptron: wt+1 = wt +∑

c∆c

DistributedPerceptron: wt+1 = wt +∑

C

c∆c

C[McDonald

et al., ACL ’10]

AveragedPerceptron: wav

t+1 =t

t+1wav

t+ wt

t+1+

∑C

cS∆c

N·(t+1)

[Collins, EMNLP ’02]

02/03/2012 Ciprian Chelba et al., Voice Search Language Modeling – p. 31

Page 32: Language Modeling for Automatic Speech Recognition Meets the … · 2018-04-06 · Language Modeling for Automatic Speech Recognition Meets the Web: Google Search by Voice Ciprian

MapReduce Implementation

SSTable Feature-Weights:

Epoch t+1

SSTable Feature-Weights: Epoch t

SSTable Utterances

SSTableService

Rerank-Mappers

Identity-Mappers

Reducers

Cache(per Map chunk)

02/03/2012 Ciprian Chelba et al., Voice Search Language Modeling – p. 32

Page 33: Language Modeling for Automatic Speech Recognition Meets the … · 2018-04-06 · Language Modeling for Automatic Speech Recognition Meets the Web: Google Search by Voice Ciprian

WERs on weakAM-dev

Model WER(%)

Baseline 32.5DLM-1gram 29.5DLM-2gram 28.3DLM-3gram 27.8ML-3gram 29.8

Our best DLM gives ∼4.7% absolute (∼15% relative)improvement over the 1-best baseline WER.

Our best ML LM trained on data T gives ∼2%absolute (∼6% relative) improvement over an ngramLM also trained on T .

02/03/2012 Ciprian Chelba et al., Voice Search Language Modeling – p. 33

Page 34: Language Modeling for Automatic Speech Recognition Meets the … · 2018-04-06 · Language Modeling for Automatic Speech Recognition Meets the Web: Google Search by Voice Ciprian

Results on T9b

Data set Baseline Reranking, MLLM

Reranking,DLM

weakAM-test 39.1 36.7 34.2T9b 14.9 14.6 14.3a

5% rel gains in WER

Note: Improvements are cut in half when comparingour models trained on data T with a reranker using anngram LM trained on T .

aStatistically significant at p<0.05

02/03/2012 Ciprian Chelba et al., Voice Search Language Modeling – p. 34

Page 35: Language Modeling for Automatic Speech Recognition Meets the … · 2018-04-06 · Language Modeling for Automatic Speech Recognition Meets the Web: Google Search by Voice Ciprian

Open Problems in LanguageModeling for ASR and Beyond

LM adaptation: bigger is not always better. Making useof related, yet not fully matched data, e.g.:

Web text should help query LM?related locales—GBR,AUS should help USA?

discriminative LM: ML estimate from correct text is oflimited use in decoding, where the LM is presentedwith atypical n-grams

can we sample from correct text instead of paralleldata (A,W ∗)?

LM smoothing, estimation: neural network LMs arestaging a comeback.

02/03/2012 Ciprian Chelba et al., Voice Search Language Modeling – p. 35

Page 36: Language Modeling for Automatic Speech Recognition Meets the … · 2018-04-06 · Language Modeling for Automatic Speech Recognition Meets the Web: Google Search by Voice Ciprian

ASR Success Story: Google Searchby Voice

What contributed to success:

excellent language model built from query stream

clearly set user expectation by existing text app

clean speech:users are motivated to articulate clearlyapp phones (Android, iPhone) do high qualityspeech capturespeech tranferred error free to ASR server over IP

Challenges:

Measuring progress: manually transcribing data is atabout same word error rate as system (15%)

02/03/2012 Ciprian Chelba et al., Voice Search Language Modeling – p. 36

Page 37: Language Modeling for Automatic Speech Recognition Meets the … · 2018-04-06 · Language Modeling for Automatic Speech Recognition Meets the Web: Google Search by Voice Ciprian

ASR Core Technology

Current state:

automatic speech recognition is incredibly complex

problem is fundamentally unsolved

data availability and computing have changedsignificantly: 2-3 orders of magnitude more of each

Challenges and Directions:

re-visit (simplify!) modeling choices made on corporaof modest size

multi-linguality built-in from start

better feature extraction, acoustic modeling02/03/2012 Ciprian Chelba et al., Voice Search Language Modeling – p. 37