A New Approach to Utterance Verification Based on Neighborhood Information in Model Space

A New Approach to Utterance Verification Based on Neighborhood Information in Model Space

Author :Hui Jiang, Chin-Hui Lee

Reporter : 陳燦輝

2

Reference [1] Hui Jiang, Chin-Hui Lee, “A new approach to utterance ve

rification based on neighborhood information in model space” ,Speech and Audio Processing, IEEE Transactions on, Vol. 11, No. 5. (2003), pp. 425-434.

[2] H. Jiang, K. Hirose, and Q. Huo, “Robust speech recognition based on Bayesian prediction approach,” IEEE Trans. Speech Audio Processing,vol. 7, pp. 426–440, July 1999.

[3] N. Merhav and C.-H. Lee, “A minimax classification approach with application To robust speech recognition,” IEEE Trans. Speech Audio Processing, vol. 1, pp. 90–100, 1993.

3

Outline Introduction

UV based on neighborhood information

Bayes factors : a bayesian tool for verification problems.

Experiments

Summary and Conclusions

4

Introduction The major difficulty with likelihood ration test-based in

utterance verification is how to model the alternative hypothesis.

It is very important to know the properties of competing source distributions.

In this paper, we are going to investigate a novel idea to perform utterance verification based on neighborhood information in model space.

5

UV based on neighborhood information

Nested neighborhoods in model space :

. model underlying thegsurroundin are which in odsneighborho nested ofset a enumerate toable are we, model givenevery for y,Intuitivel

. space model theinpoint a asview be can Each.1| as denoted ,recognizer thein HMMsdifference

N have weSuppose HMM.of space model at thelook uslet , all ofFirst

i

i

ii

Ni

6

UV based on neighborhood information (cont)

Nested neighborhoods in model space (cont) :

Fig. 1. Illustration of the structure of nested neighborhoods in HMM model space.

7



i1

i

i1

i1

ii0

i0

1i0i

withinsomewhere resides

still model optimal that theconsidered isgenerally it but model, estimated theof position original thefromshift slightly could utterance given anfor model optimal

The tion.representarobust a as serves odneighborho of kind This . model thesurrounds tightly whichodneighborho smalla very is : odneighborho tight 2)

only. center theof consists : odneighborho zero 1)

:follows as sizes odneighborho increasing with,,, odsneighborho nested ofset a define can we, model givena For

i

8



. model origrinal thefromaway far are models theseconcept, In space. model in models all include should Therefore,

space. entire thecoversactucally it and sizeinfinity an has : odneighborhoInfinity )5

. and , own its

have should modeldifferent a , handother theother.On each withoverlap should modelsdifferent of odneighborholarger The space. model in models speech

related allcover should and size inlarger even is : odneighborho Large4)

models. competing potential s' of all includespossibly , Thus .

nlarger thatly significan is and size mediuma has : odneighborho medium )3

i

i4

i4

i4

i2

i1

i0

i

i

i3

i3

ii2

i1

i2

i2

9


For a given speech segment X, assume that a ASRsystem recognizes it as word W which is represented byan HMM model

Traditionally , We usually formulate UV as a statistical hypothesis testing problem.

Here, we translate the above hypothesis testing into the following ones

W

W1

W0

model from NOT is X : model from truly is X :

HH

w

w

H

H

of region thein lies X of model tureThe :

of odneighborho tight thein lies X of model trueThe :

12'1

1'0

10


2. Fig.in shown as odneighborhotight

excludingbut odneighborho medium inside region holed thedenotes and

of models competing potential all including : odneighborho medium )

model orignal theof tionrepresentarobust a as : odneighborho tight i)

1

212

W2

W1

ii

Fig. 2. Illustration of hypothesis testing in the scenario of detecting speech recognition errors based on the neighborhood information.

11

Bayes factors

The Bayesian approach to hypothesis testing involves the calculation and evaluation of the so-called Bayes factor.

Given the observation X along with two hypotheses and , Bayes factors is computed as

0H

1H

it.reject otherwise ,accept we then, thresholdcriticalset -prea is where, BFIf

under of function likelihood theis ),|( and density,prior its is )|(,under parameter model theis 1, 0, k for where,

)1()|(),|(

)|(),|(

)|()|(

0

k00

00k

11111

00000

1

0

H

HHXfHpH

dHpHXf

dHpHXf

HXpHXpBF

k

k

12

Bayes factors (cont)

In order to use Bayes factors to solve the hypothesis testing problem, i.e. , two important issue must be addressed How to properly choose prior distribution p(.) o

f HMM model parameter for each hypothesis.

How to quantitatively define neighborhoods 21 and ,

'1

'0 vs. HH

13


priors uniform dconstraine and odneighborho )(C, : I CASE

od.neighborho theof size thecontrol toused are )1(0 and 0)C(C and od,neighborho theofpoint central theis

whichparameter model original thedenotes },,,,{ where

)2(1,11,

as and bothfor form odneighborho thedefine Westate. theinnumber mixture the

indicatesk where},,,{ i.e. mixtures, Gaussian several of consists HMM.in state th- of parameters thedenotes whereN},1,2,i|{

CDHMM. state-N an is HMM each Assume

**

1*

21

ikii

ii

ikd*ik

*ik

**

dikdikd

*ikik

*ikik

**

ikik

mrωAπ

D}dKkNiCd|-m,|mr,rω,ωA,Aπ{|πΛ(λ)

rmi

14


.separatelyr denominato andnumerator calculate We

1)(

)3()|(

)|(

)()|(

)()|(

)()(

as simplified be can factors Bayesof ncalculatio the, sassumption theseon Basedod.neighborho thein dconstraine p.d.f uniforma isparameter

HMMof ondistributiprior that theassume weod,neighborho thegiven Secondly,. and Cfor luessmaller va choose we, odneighborho

for tight and , and Cfor ueslarger val choose we, odneighborho mediumFor

12

1

1

12

12

1

1

01

1

2

dP

dXf

dXf

d

d

dpXf

dpXf

XpXpBF

(cont) priors uniform dconstraine and odneighborho )(C, : I CASE

15

Bayes factors (cont)(cont) priors uniform dconstraine and odneighborho )(C, : I CASE

)7()()|()( and

) (2) and silde 14th and 13th of assumption ()( i.e.

,parameter HMM theof pdfprior theof mean themean thedenotes where

)6(10)()5(1)()(

1 t : tionInitializa 1)VBPC. above the

ingaccomplishely approximatfor search recusive the),p( pdfprior its withalong vector parameter CDHMM ),x,,x,(xX utterancea test Given

)4()|,,(max)(p

).(p e.g. p(X),density predictive Bayesianeach compute toused is algorithm ion)classifica predictive BayesianbiVBPC(Viter The

~

*~

~

1

1

~~

1

T21

,

0

dpxpxb

dp

NiiNixbi

dlsXfX

X

itti

iii

ii

ii

ls

16



))( nformulatio original (the)11()(

))( nformulatio original (the)10(

.instant time the toup path partial optimal theon

based state to state from ns transitioofnumber daccumulate theis where

)9(

for /

for

i.e., ,instant time the toup path

partial optimal theon based theof pdf posterior theof mean theis The

)8(])([maxarg)(

)7(])([max)(

(2.1 doN j1 T,t2for : Recursion2)

)(_*

)(_

_*

_

)1(_)(_

_

~

~

~

11

~

11

_

dpaaaa

dpaaaa

t

jinjiaa

jiaa

t

aa

aij

aij

nij

n

ijn

ij

n

ij

ijijijij

ij

n

ij

n

ij

ij

ij

ijij

ijtNit

ijtNit

ijij

ij

ijij

17


)14()()|()|()|(),,,(

.stateat residing },,,,{data of oncontributi the

denotes ),,,( and instant t; time the toup path partial optimal theon

based state tobelonging vectorsfeature ofnumber daccumulate theis where

)13(),,,(

),,,()()(

Else

)12()()()(

then] )( of ncomputatio in state involve tofirst time theisIf[it

:parameter state respect to with valuepredictive partial the Update(2.2

2121

~

21

21

~

)1(21

~

21

~

_

t

~_

t

t

j

dpxpxpxpxxxb

jxxx

xxxb

jN

xxxb

xxxbjj

xbjj

jj

jnjjjjjnjjjj

Njjj

Njjjj

j

Njjjj

Njjjj

t

tjt

j

j

j

j


18


)18(21)(

where

(17)))(())((2

122

1)(

where

)16()()()()(

have weMoreover,

)15()(max)|(: nTerminatio)3

)2

(

1**1**1

)(212

1*

1

11

~

2

1*

1*

2*

*

****

dxey

CdxmrCdxmrCd

dmerCd

xf

xfxfxfxb

iWXp

y x

ddidkikd

ddidkikdd

ikd

Cdm

Cdm

mxrikd

dtddjl

tddjl

D

djltjljl

K

ltjljltj

Ti

didk

didk

ikddikd

t

ttttt

tt


19


dxey

CdxmrCdxmrCd

dzedzeCd

dzeCd

dmeCd

rxf

y x

ddidkikd

ddidkikdd

Cdxmr zCdxmr z

d

Cdxmr

Cdxmr

z

d

ikd

Cdm

Cdm

mxr

dikd

tddjl

ddidkikd

ddidkikd

ddidkikd

ddidkikd

didk

didk

ikddikd

t

)2

(

1**1**1

)( )21

()( )21

(

1

)(

)(

)21

(

1

)()21(

1

21

*

2

1** 21** 2

1**

1**

2

1*

1*

2*

*

21)(

where

))(())((2

1

21

21

21

21

21

21

2)(

ikdikd

dikdikd

dmrdz

xmrz

*

* )(

let

20


. state of

component mixture the tocloest"" vectors theof labels denote which

among , in state tobelonging vector feature deonte },,,{ where

)20(),,,(

),,,(),,,(

:},,,{data the toingcorrespond sequence labelcomponent

mixture closet"" theon based calculated is ),,,( Silmiarity

)19()(maxarg)(maxarg

i.e. closest"" is which tolabelcomponet mixture thedenotes

1

21

11

1

~

21

21

~

1

*

*

21

2121

jk

kk

Xjxxx

xxxf

xxxfxxxb

xxx

xxxb

xfxfl

xl

k

kN

k

kN

k

ttt

ttt

N

njjj

dkdkdkjkd

D

k

K

k

Njk

kkkjk

K

k

Njknjjjj

njjj

njjjj

tddjl

D

djlltjljllt

tt


),()()( 112

11 kkikikkikikkikik xxfxfxf

21


td

N

tk

dN

td

N

tkdN

dNdNk

ddNidkikdk

ddNidkikdk

dk

ikd

Cdm

Cdm

mxr

ddkdkdkjkd

xN

x

xN

x

xxN

CdxmrN

CdxmrN

CdN

dmeCd

xxxf

k

k

k

k

kk

k

k

kN

didk

didk

ikdd

kN

tikd

kN

kN

1

_

2

1

______2

2_______

2*ikd

1_

**

1_

**

1

21

2*ikd

)(21

1

2*ikd

*ikd

*ikd

1

and

)(1

with

)22(})(r21exp{

and (18), defined is )( where

)21())((

))((

21

2r

21

2r),,,(

have wely,respective ,parameters precision and mean pretrained thebeing r and m withThen

1

1*

1*

2

1

*

21


22


td

N

tk

dN

td

N

tkdN

dNdNk

dNikdkikddNdNkikd

td

N

tkikdkikdtd

N

tktd

N

tkkikd

td

N

tktd

N

tktd

N

tkikdkikd

ikdtdt

ikd

xN

x

xN

x

xxN

xmNrxxNr

xN

mNrxN

xN

Nr

xN

xN

xN

mNr

mxr

k

k

k

k

kk

kkk

kkk

kkk

kN

1

_

2

1

______2

2_______

2*ikd

2_

*2_______

2*

2

1

*2

1

2

1

*

2

1

2

1

2

1

*

2

1

*

1

and

)(1

with

)22(})(r21exp{

)(21exp())(

21exp(

)1(21exp())1()(1

21exp(

))1()(1)1(21exp(

))(21exp(

23


dzzrNCd

dmrNdzxmrNz

dmxmrNCd

dmxmrNCd

dmeCd

xxxf

dkNd

idkikdk

dkNd

idkikdk

kN

k

didk

didk

k

kN

didk

didk

k

kN

didk

didk

ikdd

kN

tikd

kN

kN

xCdmrN

xCdmrNikdk

d

ikdikdkdNikdikdk

ikd

Cdm

CdmdNikdikdkd

ikd

Cdm

CdmdNikdikdkd

ikd

Cdm

Cdm

mxr

ddkdkdkjkd

)(

)( *1

2*ikd

*_

*

2_

*1

2*ikd

2_

*1

2*ikd

)(21

1

2*ikd

_1**

_1**

1*

1*

1*

1*

1*

1*

2

1

*

21

)21exp(1

21

22r2

)(let

))(21exp(

21

22r2

))(21exp(

21

2r

21

2r),,,(

24


In this paper, in order to balance contribution from different models in the neighborhood, we introduce an exponential scale factor into the integral calculation.

The exponential scale factor is important equalize the contributions from different models in the neighborhood during the computation of Bayes factor.

If we choose , the models with large likelihood values are emphasized. On the other hand if the models with smaller likelihood values will be put more weight.

11


25


1

1_

**1_

***ikd

1

2_______

2*2*

ikd

111

1

_

_

))(())((r2

2

)])((21exp[

2r

),,()(

: follows as is density predictive Bayesianeapproximat the, },{ path optimal thegiven Therefore

dikdidkikdik

dikdidkikdik

ik

d

ikdikdikdikn

D

d

K

k

N

i

_

_

CdxmrnCdxmrnn

Cd

xxrn

dlsXfxp

p(x)ls

ik


26


baba

ba

klisxn

x

klisxn

x

klisn

tttd

T

tikikd

tttd

T

tik

ikd

tt

T

tik

,0,1

)(

withindicator delta Kronecker thedenotes (.) above, thein

)25()()(1

)24()()(1

)23()()(

where

__2

1

______2

__

1

_

__

1

27


2121

2121

21

1

21

1

2

21212

22111

and from statesHMM different for and calculate can wefinally ),usually ( and i.e. odneighborho medium andfor tight distances deviation allowed

maximally define we thenod,neighborho medium andfor tight and select manually first we, Thus

)26(max)(

as distance euclidean of termin odneighborho the inpoint with central thefrom deviation maximum thecalculate we(2), in as defined is state this

for odneighborho theassume we, parameter a HMM with Given :settingdependent -state

HMMs.all in mixture Gaussian and statesdifferent allfor medium) tight(or same theuse wecase thisIn . and where, odneighborho

mediumfor ),( and odneighborhofor tight ),(select manually We : setting Global ?),( parameters especify th How to

DDCCDDDD

rdCD

θ

CCCC

C

ikd

K

k

dD

d

i

i

28


states. HMM of level thein priorsdelta upset can also wemodel, HMM eachfor priorsdelta building of insteadTherefore, CDHMMs. tied-state useusually wesystems, ASR scale-largemany in present,At

)29()|(1

)|(1as simplified be can Hand Hhypotheses verify tofactors Bayesprior, two theseon Based

. - region for the models ofnumber total thedenotes N where

)28()(1)(

- region for the ondistributiprior a build can weSimilarity. inside models ofnumber total thedenotes N where

)27()(1)(

function.delta of mixturea as ondistributiprior a construct we, odneighborho medium andFor tight

12

1

12

12

1

1

-

2

'1

'0

12m

--

12

1t

i

i

i

i

im

it

im

it

XfN

XfN

BF

Np

Np

priorsdelta : II CASE

29

Experiments We evaluate proposed methods on Bell Labs

communicator system

In our recognition system, we used a 38-dimension feature vector, consisting of 12 Mel LPCCEP, 12 delta CEP, 12 delta-delta CEP, delta and delta-delta log-energy

The acoustic models are state-tied, tri-phone CDHMM models, which consist of roughly 4K distinct HMM states with an average 13.2 Gaussian mixture per state.

30

Experiments (cont) A class-based, tri-gram LM including 2600 words is

used.

The ASR system achieves 15.8% WER in our independent evaluation set, which includes in total 1395 utterances.

Based on the word and phoneme segmentations generated by the recognizer, we calculate a confidence score for every recognized word.

31

Experiments (cont) Baseline system : likelihood ratio test.

New approach with settings in Case I We choose neighborhood and constrained

uniform prior distribution. Since we use static, delta and delta-delta feature, we slightly modify the neighborhood definition in (2) as

),( C

)30(3

1,1

1,

,

1*

)3

2()

32

(

1*

)3

()3

(

1*

}DdKk

NiCd-mm

Cd-mmCd|-m|m

,r,rω,ωA,Aπ{|πΛ(λ)

d

dD

ikdD

ik

d

dD

ikdD

ik

dikdikd

*ikik

*ikik

**

32

Experiments (cont) New approach with settings in Case I (cont)

For the state-dependent setting , we first set up to a small value, and to a large value. According to (26) we have manually checked the range and

New approach with settings in Case || We choose delta priors in (27) and (28) in the level of

HMM state. At first, for each distinct state, we calculate its distance

from all other states. The distance between two HMM states is computed as the minimum euclidean distance between every possible pair of Gaussian components from these states

12

]0.10,0[1D]0.250,0.100[2D

D

djkdikd mm

1

2)( distance euclidean

33

Experiments (cont) New approach with settings in Case || (cont)

For each state, we sort all other states according to their distances form the underlying state.

In the first case, denoted as Case II-A, for each underlying HMM state, we choose neighborhood sizes to include exactly other states in and in

In the second case, denoted as Case II-B, from the top 1500 sorted states, we choose neighborhood sizes for to include all other states with distance less than and one’s distance between and

tN 1 mN 12

1

tD

tD mD

34

Experiments (cont)

TABLE IVERIFICATION PERFORMANCE COMPARISON (EQUAL ERROR RATE IN %) OF BASELINE UV METHOD (LRT + ANTI-MODELS) WITH THE PROPOSED NEW APPROACH IN SEVERAL DIFFERENT SETTINGS. IN EACH CASE, THE BESTPERFORMANCE OF THE NEW APPROACH AND ITS CORRESPONDING PARAMETER SETTING ARE GIVEN. HERE WE ALWAYS FIX = 1.2

35

Experiments (cont)

Fig. 3. Comparison of ROC curves for different methods when verifying mis-recognized words against correctly recognized words in ASR outputs.

36

Summary and Conclusions

The basic idea is to assume that all competing models of a given model sit inside one neighborhood of the underlying model.

More research works are still need to search for a better neighborhood definition in high- dimension HMM model space.

Another possible research direction for future works , in stead of Bayes factors, such as generalized likelihood ratio testing (GLRT) can also be used to implement the neighborhood based UV

A New Approach to Utterance Verification Based on Neighborhood Information in Model Space

Documents

Transcript of A New Approach to Utterance Verification Based on Neighborhood Information in Model Space