Office 365 とのドメイン間フェデレーション - Cisco...Office 365 とのドメイン間フェデレーション•Office365ドメイン間フェデレーションの概要,1ページ
1 Incorporating In-domain Confidence and Discourse Coherence Measures in Utterance Verification...
-
Upload
lucas-allison -
Category
Documents
-
view
213 -
download
0
Transcript of 1 Incorporating In-domain Confidence and Discourse Coherence Measures in Utterance Verification...
1
Incorporating In-domain Confidence and Discourse Coherence Measures
in Utterance Verification
ドメイン内の信頼度と談話の整合性
を用いた音声認識誤りの検出
Ian R. Lane, Tatsuya KawaharaSpoken Language Communications Research
Laboratories, ATRSchool of Informatics, Kyoto University
2
Introduction
• Current ASR technologies not robust against:– Acoustic mismatch: noise, channel, speaker
variance– Linguistic mismatch: disfluencies, OOV, OOD
• Assess confidence of recognition hypothesis, and detect recognition errors
Effective user feedback • Select recovery strategy based on type of
error and specific application
3
Previous Works on Confidence Measures
• Feature-based– [Kemp] word-duration, AM/LM back-off
• Explicit model-based– [Rahim] likelihood ratio test against cohort model
• Posterior probability– [Komatani, Soong, Wessel] estimate posterior
probability given all competing hypotheses in a word-graph
Approaches limited to “low-level” information available during ASR decoding
4
Proposed Approach
• Exploit knowledge sources outside ASR framework for estimating recognition confidencee.g. knowledge about application domain, discourse
flow
Incorporate CM based on “high-level” knowledge sources
• In-domain confidence– degree of match between utterance and application
domain • Discourse coherence
– consistency between consecutive utterances in dialogue
5
CMin-domain(Xi): in-domain confidence
CMdiscourse(Xi|Xi-1): discourse coherence
CM(Xi): joint confidence score, combine above with
generalized posterior probability CMgpp(Xi)
CMdiscourse(Xi|Xi-1)
Xi CMin-domain(Xi)Topic
Classification
In-domain Verificatio
n
dist(Xi,Xi-1)
Input utteranc
e
CMgpp(Xi)
ASR front-end
Out-of-domain Detection
CM(Xi)
Xi-1CMin-domain(Xi-1)Topic
Classification
In-domain Verificatio
n
ASR front-end
Out-of-domain Detection
Utterance Verification Framework
6
In-domain Confidence
• Measure of topic consistency with application domain– Previously applied in out-of-domain utterance detection
Examples of errors detected via in-domain confidence
Mismatch of domainREF: How can I print this WORD file double-sidedASR: How can I open this word on the pool-side
hypothesis not consistent by topic in-domain confidence low
Erroneous recognition hypothesisREF: I want to go to Kyoto, can I go by busASR: I want to go to Kyoto, can I take a bath
hypothesis not consistent by topic in-domain confidence low
REF: correct transcription ASR: speech recognition hypothesis
7
Input Utterance Xi(recognition hypothesis)
Feature Vector
Topic confidence scores (C(t1|Xi), ... ,C(tm|Xi))
Transformation to Vector-space
In-Domain VerificationVin-domain(Xi)
CMin-domain(Xi) In-domain confidence
Classification of Multiple Topics SVM (1~m)
In-domain Confidence
8
Input Utterance Xi(recognition hypothesis)
In-Domain VerificationVin-domain(Xi)
CMin-domain(Xi)
Classification of Multiple Topics SVM (1~m)
In-domain Confidencee.g. ‘could I have a
non-smoking seat’
(a, an, …, room, …, seat, …, I+have, …(1, 0 , …, 0 , …, 1 , …, 1 ,
…
accom. airplane airport …0.05 0.36 0.94
90 %
Transformation to Vector-space
9
In-domain Verification Model
• Linear discriminate verification model applied
1, …, m trained on in-domain data using “deleted
interpolation of topics” and GPD [lane ‘04]
idomain-indomain-in XVsigmoidXCM i
m
iijji XtCXV
1domain-in |)(
C(tj|Xi): topic classification confidence score of topic tj for input utterance X
j: discriminate weight for topic tj
10
Discourse Coherence
• Topic consistency with preceding utterance
Examples of errors detected via discourse-coherence
Erroneous recognition hypothesisSpeaker A: Previous utterance [Xi-1]
REF: What type of shirt are you looking for?ASR: What type of shirt are you looking for?
Speaker B: Current utterance [Xi]REF: I’m looking for a white T-shirt.ASR: I’m looking for a white teacher.
topic not consistent across utterances discourse coherence low
REF: correct transcription ASR: speech recognition hypothesis
11
• Euclidean distance between current (Xi) and previous (Xi-1) utterances in topic confidence space
• CMdiscourse large when Xi, Xi-1 related, low when differ
1Euclidean1discourse ,| iiii XXdistsigmoidXXCM
Discourse Coherence
m
jijijii XtCXtCXXdist
1
211Euclidean ||),(
12
Joint Confidence Score
Generalized Posterior Probability• Confusability of recognition hypothesis
against competing hypotheses [Lo & Soong]
• At utterance level:
l
jjgpp xGWPPXCM
1
)(
GWPP(xj): generalized word posterior probability of xj
xj: j-th word in recognition hypothesis of X
13
Joint Confidence Score
where
• For utterance verification compare CM(Xi) to threshold ()
• Model weights (gpp, in-domain, discourse), and threshold () trained on development set
1discoursedomain-ingpp
)|(
)()()(
1discoursediscourse
domain-indomain-ingppgpp
ii
iii
XXCM
XCMXCMXCM
14
Experimental Setup
• Training-set: ATR BTEC (basic-travel-expressions-corpus)– ~400k sentences (Japanese/English pairs)– 14 topic classes (accommodation, shopping, transit, …) – Train: topic-classification + in-domain verification
models
• Evaluation data: ATR MAD (machine aided dialogue)– Natural dialogue between English and Japanese speakers via
ATR speech-to-speech translation system– Dialogue data collected based on set of pre-defined scenarios– Development-set: 270 dialogues Test-set: 90 dialogues
On development set train: CM sigmoid transformsCM weights (gpp, in-domain, discourse) Verification threshold ()
15
Speech Recognition Performance
Development
Test
# dialogues 270 90
Japanese Side
# utterances 2674 1011
WER 10.5% 10.7%
SER 41.9% 42.3%
English Side
# utterances 3091 1006
WER 17.0% 16.2%
SER 63.5% 55.2%
• ASR performed with ATRASR; 2-gram LM applied during decoding, rescore lattice with 3-gram LM
16
Evaluation Measure
• Utterance-based Verification– No definite “keyword” set in S-2-S translation– If recognition error occurs (one or more errors)
prompt user to rephrase entire utterance
• CER (confidence error rate)– FA: false acceptance of incorrectly recognized
utterance– FR: false rejection of correctly recognized
utterance utterances#
FR#FA# CER
17
GPP-based Verification Performance
• Accept All: Assume all utterances are correctly recognized• GPP: Generalized posterior probability
Large reduction in verification errors compared with “Accept all” case CER 17.3% (Japanese) and 15.3% (English)
0
20
40
60
Japanese side English side
CER
(%
)
Accept All
GPP
GPP
Accept All
18 CER reduced by 5.7% and 4.6% for “GPP+IC” and “GPP+DC” cases CER 17.3% 15.9% (8.0% relative) for “GPP+IC+DC” case
Incorporation of IC and DC Measures (Japanese)
GPP: Generalized posterior probabilityIC: In-domain confidence DC: Discourse coherence
12.0
14.0
16.0
18.0
CER
(%
)
GPP
+IC
GPP +DC
GPP +IC +DC
GPP
19 Similar performance for English side CER 15.3% 14.4% for “GPP+IC+DC” case
Incorporation of IC and DC Measures (English)
GPP: Generalized posterior probabilityIC: In-domain confidence DC: Discourse coherence
12.0
14.0
16.0
18.0
CER
(%
)
GPP
+IC
GPP +DC
GPP +IC +DC
GPP
20
Proposed novel utterance verification scheme incorporating “high-level” knowledgeIn-domain confidence:
degree of match between utterance and application domain
Discourse coherence:consistency between consecutive utterances
Two proposed measures effectiveRelative reduction in CER of 8.0% and 6.1%
(Japanese/English)
Conclusions
21
“High-level” content-based verification Ignore ASR-errors that do not affect translation
qualityFurther improvement in performance
Topic Switching– Determine when users switch task
Consider single task per dialogue session
Future work