G.Kumaravelan Pondicherry University , Karaikal Centre, Karaikal . R. SivaKumar
description
Transcript of G.Kumaravelan Pondicherry University , Karaikal Centre, Karaikal . R. SivaKumar
![Page 1: G.Kumaravelan Pondicherry University , Karaikal Centre, Karaikal . R. SivaKumar](https://reader036.fdocuments.net/reader036/viewer/2022062314/56814749550346895db48961/html5/thumbnails/1.jpg)
Learning Automata based Approach to Model Dialogue Strategy in Spoken Dialogue System: A Performance Evaluation
G.Kumaravelan Pondicherry University, Karaikal Centre, Karaikal.
R. SivaKumar
AVVM Sri Pushpam College, Thanjavur.
![Page 2: G.Kumaravelan Pondicherry University , Karaikal Centre, Karaikal . R. SivaKumar](https://reader036.fdocuments.net/reader036/viewer/2022062314/56814749550346895db48961/html5/thumbnails/2.jpg)
Dialogue System
A system to provide interface between the
user and a computer-based applicationInteract on turn-by-turn basisDialogue manager
Control the flow of the dialogue information gathering from user communicating with external
application communicating information back to the
userThree types of dialogue system (On initiativeness)
finite state- (or graph-) based frame-based agent-based
![Page 3: G.Kumaravelan Pondicherry University , Karaikal Centre, Karaikal . R. SivaKumar](https://reader036.fdocuments.net/reader036/viewer/2022062314/56814749550346895db48961/html5/thumbnails/3.jpg)
Spoken Dialogue System Architecture
SpeechRecognition
DialogueManager
Backend
LanguageGeneration
Text to SpeechSynthesis
Audio
SpokenLanguage
Understanding
WordsSemantic
representation
ConceptsWords
Audio
![Page 4: G.Kumaravelan Pondicherry University , Karaikal Centre, Karaikal . R. SivaKumar](https://reader036.fdocuments.net/reader036/viewer/2022062314/56814749550346895db48961/html5/thumbnails/4.jpg)
Properties of RL: The Agent-Environment Interaction
1
1
:statenext resulting and
:reward resulting gets
)( : stepat action produces
: stepat state observesAgent
,2,1,0 :steps timediscreteat interact t environmen andAgent
t
t
tt
t
s
r
sAat
Sst
t
![Page 5: G.Kumaravelan Pondicherry University , Karaikal Centre, Karaikal . R. SivaKumar](https://reader036.fdocuments.net/reader036/viewer/2022062314/56814749550346895db48961/html5/thumbnails/5.jpg)
Cont…
maximize? want to wedoWhat
... ,,,
:is stepafter rewards of sequence theSuppose
321 ttt rrr
t
In general,
we want to maximize the expected return, E Rt , for each step t.
Tttt rrrR ... 21
Immediate rewardLong term reward
Aim is to find the policy that leads to the highest total reward over T time steps (finite horizon) [Markov property]
![Page 6: G.Kumaravelan Pondicherry University , Karaikal Centre, Karaikal . R. SivaKumar](https://reader036.fdocuments.net/reader036/viewer/2022062314/56814749550346895db48961/html5/thumbnails/6.jpg)
The formal decision problem - MDP
Given <S, A, P, R, T >
S is a finite state set (with start state s0)
A is a finite action set
P(s’ | s, a) is a table of transition probabilities
R(s, a, s’) is a reward function
Policy p(s, t) = a
Is there a policy that yields total reward over finite
horizon T
![Page 7: G.Kumaravelan Pondicherry University , Karaikal Centre, Karaikal . R. SivaKumar](https://reader036.fdocuments.net/reader036/viewer/2022062314/56814749550346895db48961/html5/thumbnails/7.jpg)
Learning Automata Characteristics
Learning Automata (LA) are adaptive decision making devices that can operate in environments where
they have no information about the effect of their actions at start of operation — unknown environments
a given action not necessarily produces the same response each time it is performed — non-deterministic environments
A powerful property of LA is that they progressively improve their
performance by the means of a learning process combine rapid and accurate convergence with low computational
complexity.
![Page 8: G.Kumaravelan Pondicherry University , Karaikal Centre, Karaikal . R. SivaKumar](https://reader036.fdocuments.net/reader036/viewer/2022062314/56814749550346895db48961/html5/thumbnails/8.jpg)
Learning Automaton and its interactionwith the environment
Set of Actions A = { a1, …an }
Response
β = { 0, 1 }
![Page 9: G.Kumaravelan Pondicherry University , Karaikal Centre, Karaikal . R. SivaKumar](https://reader036.fdocuments.net/reader036/viewer/2022062314/56814749550346895db48961/html5/thumbnails/9.jpg)
Methodology
Follows a frame based approach which maintains task and attribute histories
respect to the domain in focus.
The state space is determined by the number of slots in focus.
The action space is narrowed to “greeting”, “request all”, “request n slot”,
“verify all”, “verify n slot” and close dialogue.
![Page 10: G.Kumaravelan Pondicherry University , Karaikal Centre, Karaikal . R. SivaKumar](https://reader036.fdocuments.net/reader036/viewer/2022062314/56814749550346895db48961/html5/thumbnails/10.jpg)
Experiments and Results
Our experiments were based on travel planning domain.
Speech recognition & speech synthesis modules are implement by .NET SDK framework.
DATE scheme was used as a dialogue act recognition agent.
The reward in the range of (+10 to -5) is assigned for the best and worst action selection respectively.
![Page 11: G.Kumaravelan Pondicherry University , Karaikal Centre, Karaikal . R. SivaKumar](https://reader036.fdocuments.net/reader036/viewer/2022062314/56814749550346895db48961/html5/thumbnails/11.jpg)
Cont…
![Page 12: G.Kumaravelan Pondicherry University , Karaikal Centre, Karaikal . R. SivaKumar](https://reader036.fdocuments.net/reader036/viewer/2022062314/56814749550346895db48961/html5/thumbnails/12.jpg)
Cont…
![Page 13: G.Kumaravelan Pondicherry University , Karaikal Centre, Karaikal . R. SivaKumar](https://reader036.fdocuments.net/reader036/viewer/2022062314/56814749550346895db48961/html5/thumbnails/13.jpg)
Evaluation Methodology
PRADISE framework Task success:
Calculated with help of AVM.
System performance:
![Page 14: G.Kumaravelan Pondicherry University , Karaikal Centre, Karaikal . R. SivaKumar](https://reader036.fdocuments.net/reader036/viewer/2022062314/56814749550346895db48961/html5/thumbnails/14.jpg)
Conclusions Challenges
LA are interesting building blocks to solve different type of RL problems
Faster learning
Knowledge transfer
Less Computational complexity
Different LA updates
Influence different state observations (POMDP setting)
![Page 15: G.Kumaravelan Pondicherry University , Karaikal Centre, Karaikal . R. SivaKumar](https://reader036.fdocuments.net/reader036/viewer/2022062314/56814749550346895db48961/html5/thumbnails/15.jpg)
References I
E. Levin, R. Pieraccini and W. Eckert.
A stochastic model of human-machine interaction for learning dialogue strategies. IEEE Trans. on Speech and Audio Processing, 8(1), pp. 11–23, 2000.
M. McTear.
Spoken dialogue technology: Toward the conversational user interface. Springer, 2004.
K. Narendra and M.A.L. Thathachar.
Learning Automata: An Introduction. Prentice-Hall International, Inc, 1989.
A. Nowe and K. Verbeeck.
Colonies of learning Automata. IEEE Trans. Syst. Man Cybern B, 32, pp.772-780, 2002.
T. Peak and R. Pieraccini. Automating spoken dialogue management design using machine learning: an industry perspective. Speech Communication, 50, pp. 716-729, 2008.
O. Pietquin and T. Dutoit. A probabilistic framework for dialogue simulation and optimal strategy learning. IEEE Transactions on Speech and Audio Processing, 14(2), pp. 589–599, 2006.
![Page 16: G.Kumaravelan Pondicherry University , Karaikal Centre, Karaikal . R. SivaKumar](https://reader036.fdocuments.net/reader036/viewer/2022062314/56814749550346895db48961/html5/thumbnails/16.jpg)
References II
K. Scheffler and S. Young.
Automatic learning of dialogue strategy using dialogue simulation and reinforcement learning.
Human Language Technology Conference (HLT), pp 12–19, 2002.
S. Singh, D. Litman, and M. Kearns.
Optimizing dialogue management with reinforcement learning: Experiments with the NJFun system,
Journal of Artificial Intelligence Research, 16, pp. 105–133, 2002.
M. A. L. Thathachar and P. S. Sastry.
Networks of Learning Automata: Techniques for Online Stochastic Optimization. Norwell, MA, Kluwer, 2004.
M. Walker and R. Passonneau. 2001.
DATE:A dialogue act tagging scheme for evaluation of spoken dialogue systems. Proceedings of the Human Language Technology Conference, pp. 1–8, 2001.
M. Walker, D. Litman and C. Kamm.
PARADISE: A framework for evaluating spoken dialogue agents. Proc. of the 5th annual meeting of the association for computational linguistics, pp. 271–280, 1997..
![Page 17: G.Kumaravelan Pondicherry University , Karaikal Centre, Karaikal . R. SivaKumar](https://reader036.fdocuments.net/reader036/viewer/2022062314/56814749550346895db48961/html5/thumbnails/17.jpg)
Questions ?