ACU Student Evaluation of Unit & Student Evaluation of Teaching surveys
Student simulation and evaluation
description
Transcript of Student simulation and evaluation
Student simulation Student simulation and evaluation and evaluation DOD meetingDOD meetingHua Ai ([email protected])Hua Ai ([email protected])03/03/200603/03/2006
22
OutlineOutline MotivationsMotivations BackgroundsBackgrounds CorpusCorpus Student Simulation ModelStudent Simulation Model ComparisonsComparisons Conclusions & Future WorkConclusions & Future Work
33
MotivationsMotivations For larger corpusFor larger corpus
Reinforcement Learning (RL) is used to Reinforcement Learning (RL) is used to learn the best policy for spoken dialogue learn the best policy for spoken dialogue systems automaticallysystems automatically
Best strategy may often not even be present Best strategy may often not even be present in small datasetin small dataset
For cheaper corpusFor cheaper corpus Human subjects are expensiveHuman subjects are expensive
44
Simulated UserDialog Manager
Strategy
Reinforcement Learning
DialogCorpus
Simulation models
Strategy learning using a simulated user (Schatzmann et al., Strategy learning using a simulated user (Schatzmann et al., 2005)2005)
55
Backgrounds (1)Backgrounds (1) Education communityEducation community
Focusing on changes of student’s inner-Focusing on changes of student’s inner-brain knowledge representation formsbrain knowledge representation forms
Usually not dialogue basedUsually not dialogue based Simulated students for (Venlehn et al., 1994) Simulated students for (Venlehn et al., 1994)
tutor trainingtutor training Collaborative learningCollaborative learning
66
Backgrounds (2)Backgrounds (2) Dialogue communityDialogue community
Focusing on interactions and dialogue Focusing on interactions and dialogue behaviorsbehaviors
Simulated users have limited actions to takeSimulated users have limited actions to take (Schatzmann et al., 2005)(Schatzmann et al., 2005)
Simulating on DA levelSimulating on DA level
77
Corpus (1)Corpus (1) Spoken dialogue physics tutor (ITSPOKE)Spoken dialogue physics tutor (ITSPOKE)
88
Corpus (2)Corpus (2) Tutoring procedureTutoring procedure
(T) Question
(S) Answer
Dialogue(T) Q(S) A
…
Essay revision
Dialogue
(T) Question
(S) Answer
Dialogue(T) Q(S) A
…
Essay revision
Dialogue
… …
5 problems
99
Corpus (3)Corpus (3) Tutor’s behaviorsTutor’s behaviors
Defined in KCD (Knowledge Construction Defined in KCD (Knowledge Construction Dialogues)Dialogues)
Correct
Incorrect/Partially Correct
1010
Corpus (4)Corpus (4)
#dialogues stuWord stuTurn tutorWord tutorTurn
f03 100 avg 57.16 23.35 1256.92 29.64
(Synthesized) stdev 45.57638 17.44334 849.8195 19.76351
05syn 136 avg 91.0963 30.78519 1655.467 38.06667
(Synthesized) stdev 53.82931 14.42551 757.8744 16.32469
05pre 135 avg 87.34559 30.11765 1597.206 37.33088(pre-recorded) stdev 55.48004 16.96972 832.9845 18.20096
f03:s05 Different groups of subjectsf03:s05 Different groups of subjects
1111
Simulation Models (1)Simulation Models (1) Simulating on word levelSimulating on word level
Student’s have more complex behaviorsStudent’s have more complex behaviors DA info alone isn’t enough for the systemDA info alone isn’t enough for the system
Two models trained on two corpusTwo models trained on two corpus
ProbCorrect
Random
f03
s05
03ProbCorrect
03Random
05ProbCorrect
05Random
1212
Simulation Models (2)Simulation Models (2) ProbCorrect ModelProbCorrect Model
Simulates average knowledge level of real Simulates average knowledge level of real studentsstudents
Simulate meaningful dialogue behaviorsSimulate meaningful dialogue behaviors Random ModelRandom Model
Non-senseNon-sense As a contrastAs a contrast
1313ProbCorrect ModelProbCorrect Model
Real corpusquestion1Answer1_1 (c)Answer1_2 (ic)Answer1_3 (ic)
question2Answer2_1 (c)Answer2_2 (ic)
Candidate Ans:For question1c:ic = 1:2c:Answer1_1ic:Answer1_2Answer1_3
For question2c:ic = 1:1c:Answer2_1icAnswer2_2
ProbCorrect Model:Question 1Answer: 1) Choose to give a
c/ic answer with the same average probability as real student
2) Randomly choose one answers from the corresponding answer set
1414
HC03&05Question1Answer1_1Answer1_2Answer1_3Answer1_4
Question2Answer2_1Answer2_2
Candidate Ans:1) Answer1_12) Answer1_23) Answer1_34) Answer1_45) Answer2_16) Answer2_2
Big random Model:Question i:
Answer: any of the 6 answers with the same probability
(Regardless the question!)
Random ModelRandom Model
1515
ExperimentsExperiments Comparisons between real corporaComparisons between real corpora Comparisons between real & simulated Comparisons between real & simulated
corporacorpora Comparisons between simulated corporaComparisons between simulated corpora
1616
Evaluation metricsEvaluation metrics High-level dialog features High-level dialog features Dialog style and cooperativeness Dialog style and cooperativeness Dialog Success Rate and Efficiency Dialog Success Rate and Efficiency Learning GainsLearning Gains
Real Corpora Real Corpora Comparisons (1)Comparisons (1)
1717
High-level dialog featuresHigh-level dialog features
Real corpora comparisons Real corpora comparisons (2)(2)
1818
Real corpora comparisons Real corpora comparisons (3)(3)
Dialogue style featuresDialogue style features
1919
Real corpora comparisons Real corpora comparisons (3)(3)
Dialogue success rateDialogue success rate
2020
Real corpora comparisons Real corpora comparisons (4)(4)
Learning gains featuresLearning gains features
2121
ResultsResults Differences captured by these simple Differences captured by these simple
metrics can’t help to conclude whether a metrics can’t help to conclude whether a corpus is real or not (Schatzmann et al., corpus is real or not (Schatzmann et al., 2005)2005)
Differences could be due to different user Differences could be due to different user population population
2222
Real Vs Simulated Real Vs Simulated Corpora Comparisons Corpora Comparisons
00.20.40.60.8
11.21.41.61.8
2
tutorT
urn
tutorW
ord
tWord
Rate
stuTurn
stuW
ord
sWord
Rate
corre
ctRate
f03 03smooth 03random s05 05smooth
2323
Results (1) Results (1) Most of the measurements are able to Most of the measurements are able to
distinguish between Random and distinguish between Random and ProbCorrect modelProbCorrect model
ProbCorrect model generates more ProbCorrect model generates more realistic behaviorsrealistic behaviors
We can’t conclude on the power of these We can’t conclude on the power of these metrics since the two simulated corpus metrics since the two simulated corpus are really differentare really different
2424
Results (2)Results (2) Differences between real and random Differences between real and random
models are captured clearly, but models are captured clearly, but differences between real and differences between real and ProbCorrect is not clearProbCorrect is not clear
We don’t expect this simple model to give We don’t expect this simple model to give very real corpus. It’s surprising that the very real corpus. It’s surprising that the differences are small differences are small
2525
Results (3)Results (3) S05 variety > f03 variety S05 variety > f03 variety
05probCorrect variety > 03probCorrect 05probCorrect variety > 03probCorrect varietyvariety
However, we don’t get significantly more However, we don’t get significantly more varieties in the simulated corpus than the varieties in the simulated corpus than the real onesreal ones Could be the computer tutor is simple (c/ic)Could be the computer tutor is simple (c/ic) We’re using the same candidate answer setWe’re using the same candidate answer set
2626
Results (4)Results (4) ProbCorrect models trained on different ProbCorrect models trained on different
real corpora are quite differentreal corpora are quite different The ProbCorrect model is more similar to The ProbCorrect model is more similar to
the real corpus it is trained from than to the real corpus it is trained from than to the other real corpusthe other real corpus
2727
Comparisons between Comparisons between simulated dialogues with simulated dialogues with different dialogue structuredifferent dialogue structure
f03problem34
00.20.40.60.8
11.21.4
03prob 03smoothed 03random
f03problem7
00.20.40.60.8
11.21.41.6
tutorT
urn
tutorW
ord
tWord
Rate
stuTurn
stuW
ord
sWord
Rate
corre
ctRate
03prob 03smoothed 03random
2828
ResultsResults Larger differences between the two Larger differences between the two
simulated corpora in prob7 than in simulated corpora in prob7 than in prob34prob34
Dialogue structure of prob34 is more Dialogue structure of prob34 is more restrictedrestricted
The power of these simple metrics is The power of these simple metrics is restricted by the dialogue structurerestricted by the dialogue structure
2929
ConclusionsConclusions The simple measurements can The simple measurements can
distinguish between distinguish between real corporareal corpora
Different populationDifferent population simulated and real corpora simulated and real corpora
To different extentTo different extent simulated corporasimulated corpora
Different modelsDifferent models Trained on different corporaTrained on different corpora Limited to different Dialog structureLimited to different Dialog structure
3030
Future workFuture work Explore “deep” evaluation metricsExplore “deep” evaluation metrics Test simulated corpus on policyTest simulated corpus on policy More simulation modelsMore simulation models
More human featuresMore human features Emotion, learningEmotion, learning
Special casesSpecial cases Quick learners, slow learnersQuick learners, slow learners