Temporal Models for Predicting Student Dropout in Massive...
Transcript of Temporal Models for Predicting Student Dropout in Massive...
Temporal Models for Predicting Student Dropout inMassive Open Online Courses
Fei Mi, Dit-Yan Yeung
Hong Kong University of Science and Technology (HKUST)
[email protected] ([email protected])
November, 14th, 2015
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 1 / 17
Outline
1 Background and Motivation
2 Temporal Models
3 Experiments
4 Conclusion
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 2 / 17
Outline
1 Background and Motivation
2 Temporal Models
3 Experiments
4 Conclusion
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 3 / 17
Overview
1 What can we do?Performance evaluation (Peer Grading)Help students engage and perform better (Dropout Prediction)Build personalized platform (Recommendation)
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 3 / 17
Overview
1 What can we do?
Performance evaluation (Peer Grading)Help students engage and perform better (Dropout Prediction)Build personalized platform (Recommendation)
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 3 / 17
Overview
1 What can we do?Performance evaluation (Peer Grading)
Help students engage and perform better (Dropout Prediction)Build personalized platform (Recommendation)
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 3 / 17
Overview
1 What can we do?Performance evaluation (Peer Grading)Help students engage and perform better (Dropout Prediction)
Build personalized platform (Recommendation)
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 3 / 17
Overview
1 What can we do?Performance evaluation (Peer Grading)Help students engage and perform better (Dropout Prediction)Build personalized platform (Recommendation)
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 3 / 17
Overview
1 What can we do?Performance evaluation (Peer Grading)Help students engage and perform better (Dropout Prediction)Build personalized platform (Recommendation)
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 4 / 17
Motivation of our work
1 High attrition rate commonly on MOOC platforms (60%� 80%)
2 Current methods: SVM, Logistic RegressionActivity features (lecture video, discussion forum)Static models
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 5 / 17
Motivation of our work
1 High attrition rate commonly on MOOC platforms (60%� 80%)
2 Current methods: SVM, Logistic RegressionActivity features (lecture video, discussion forum)Static models
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 5 / 17
Contribution of our work
1 A sequence labeling perspective
Week 1 Week 2 Week 3 Week 4 Week t
𝒙1 𝒙2 𝒙3 𝒙4 𝒙𝑡
𝑦1 𝑦2 𝑦3 𝑦4 𝑦𝑡 Labels
Activities
2 Compare di↵erent temporal machine learning modelsInput-output Hidden Markov Model (IOHMM)Recurrent Neural Network (RNN)RNN with long short-term memory (LSTM) cells
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 6 / 17
Contribution of our work
1 A sequence labeling perspective
Week 1 Week 2 Week 3 Week 4 Week t
𝒙1 𝒙2 𝒙3 𝒙4 𝒙𝑡
𝑦1 𝑦2 𝑦3 𝑦4 𝑦𝑡 Labels
Activities
2 Compare di↵erent temporal machine learning models
Input-output Hidden Markov Model (IOHMM)Recurrent Neural Network (RNN)RNN with long short-term memory (LSTM) cells
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 6 / 17
Contribution of our work
1 A sequence labeling perspective
Week 1 Week 2 Week 3 Week 4 Week t
𝒙1 𝒙2 𝒙3 𝒙4 𝒙𝑡
𝑦1 𝑦2 𝑦3 𝑦4 𝑦𝑡 Labels
Activities
2 Compare di↵erent temporal machine learning modelsInput-output Hidden Markov Model (IOHMM)Recurrent Neural Network (RNN)RNN with long short-term memory (LSTM) cells
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 6 / 17
Outline
1 Background and Motivation
2 Temporal Models
3 Experiments
4 Conclusion
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 7 / 17
How to capture temporal information?
Sliding window structures (NLP tasks):
1 Features aggregated using sliding window structure
2 Temporal span fixed by sliding window
Temporal models:
1 Learn from the previous inputs and the current input
2 Temporal pathway allows a “memory” of the previous inputs topersist in the internal state
3 Flexible temporal span, learn from data
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 7 / 17
How to capture temporal information?
Sliding window structures (NLP tasks):
1 Features aggregated using sliding window structure
2 Temporal span fixed by sliding window
Temporal models:
1 Learn from the previous inputs and the current input
2 Temporal pathway allows a “memory” of the previous inputs topersist in the internal state
3 Flexible temporal span, learn from data
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 7 / 17
Input-output Hidden Markov Model (IOHMM)
Originated from HMMLearn to map input sequences to output sequences
ht = Aht�1
+ Bxt +N (0,Q)
y
t
= Cht +N (0,R)(1)
IOHMM 1
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 8 / 17
Input-output Hidden Markov Model (IOHMM)
Originated from HMMLearn to map input sequences to output sequences
ht = Aht�1
+ Bxt +N (0,Q)
y
t
= Cht +N (0,R)(1)
!" !"#$
%&%&'(
!"'$
)&'( )& )&#(
Hidden&states
Dropout labels
Input&features
%&#(
…
…
…
…
IOHMM 1
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 8 / 17
Vanilla Recurrent Neural Network (Vanilla RNN)
RNN allows the network connections to form cycles.
ht = H(W1
xt + W
2
ht�1
+ b
h
)
y
t
= F(W3
ht + b
y
)(2)
Left: Vanilla RNN structure; Right: Vanilla RNN unfolded
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 9 / 17
Drawbacks of RNN
1 Influence of an input either decays or blows up as it cycles therecurrent connection
2 Vanishing gradient problem
3 The range of temporality that can be accessed in practice is usuallyquite limited
4 Dynamic state of regular RNN is short-term memory
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 10 / 17
Drawbacks of RNN
1 Influence of an input either decays or blows up as it cycles therecurrent connection
2 Vanishing gradient problem
3 The range of temporality that can be accessed in practice is usuallyquite limited
4 Dynamic state of regular RNN is short-term memory
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 10 / 17
Drawbacks of RNN
1 Influence of an input either decays or blows up as it cycles therecurrent connection
2 Vanishing gradient problem
3 The range of temporality that can be accessed in practice is usuallyquite limited
4 Dynamic state of regular RNN is short-term memory
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 10 / 17
Long Short-Term Memory Cell (LSTM)
1 Hochreiter & Schimidhuber (1997)solved the problem of getting anRNN to remember things for a longtime.
m n
1 Information get into a cellwhenever the “input” gate is on
2 Information stays in the cell solong as the “forget” gate isclosed
3 Information can read from thecell by turning the “output”gate on
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 11 / 17
Update Functions of LSTM
m n
i
t
= �(Wxi
x
t
+W
hi
h
t�1 +W
ci
c
t�1 + b
i
)
f
t
= �(Wxf
x
t
+W
hf
h
t�1 +W
cf
c
t�1 + b
f
)
c
t
= f
t
⌦ c
t�1 + i
t
⌦ tanh(Wxc
x
t
+W
hc
h
t�1 + b
c
)
o
t
= �(Wxo
x
t
+W
ho
h
t�1 +W
co
c
t�1 + b
o
)
h
t
= o
t
⌦ tanh(ct
)
(3)
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 12 / 17
Hybrid of LSTM Memory Cells and RNN (LSTM Network)
…
…
…
……
…
Left: Hybrid of LSTM and RNN (LSTM network); Right: LSTM networkunfolded
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 13 / 17
Outline
1 Background and Motivation
2 Temporal Models
3 Experiments
4 Conclusion
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 14 / 17
Datasets for Dropout Prediction
1 “Science of Gastronomy”, six-week course (Coursera).
2 85394 ! 39877
1 “Introduction to Java Programming”, ten-week course (edX).
2 46972 ! 27629
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 14 / 17
Datasets for Dropout Prediction
1 “Science of Gastronomy”, six-week course (Coursera).
2 85394 ! 39877
1 “Introduction to Java Programming”, ten-week course (edX).
2 46972 ! 27629
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 14 / 17
Dropout Definitions
1 Three definitions capture di↵erent contexts of the student status in a course
DEF1 Participation in the final week: whether a student will stayto the end of the course [Yang et al.2013, Ramesh et al.2014, He et al.2015]
DEF2 Last week of engagement: whether the current week is thelast week the student has activities [Amnueypornsakul et al.2014,Kloft et al.2014, Sinha et al.2014, Sharkey and Sanders2014, Taylor et al.2014]
DEF3 Participation in the next week: whether a student hasactivities in the comming week
Three dropout definitions
Time Week 1 Week 2 Week 3 Week 4 Week 5
Features [7,34,9,2,0,7,5] Zeros [6,3,12,4,1,8,3] Zeros Zeros
DEF1 1 1 1 1 1DEF2 0 0 1 1 nullDEF3 1 0 1 1 null
An illustrative example for DEF1-DEF3
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 15 / 17
Dropout Definitions
1 Three definitions capture di↵erent contexts of the student status in a course
DEF1 Participation in the final week: whether a student will stayto the end of the course [Yang et al.2013, Ramesh et al.2014, He et al.2015]
DEF2 Last week of engagement: whether the current week is thelast week the student has activities [Amnueypornsakul et al.2014,Kloft et al.2014, Sinha et al.2014, Sharkey and Sanders2014, Taylor et al.2014]
DEF3 Participation in the next week: whether a student hasactivities in the comming week
Three dropout definitions
Time Week 1 Week 2 Week 3 Week 4 Week 5
Features [7,34,9,2,0,7,5] Zeros [6,3,12,4,1,8,3] Zeros Zeros
DEF1 1 1 1 1 1DEF2 0 0 1 1 nullDEF3 1 0 1 1 null
An illustrative example for DEF1-DEF3
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 15 / 17
Model Performance Comparison
Week1 2 3 4 5
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF1)
LSTM NetworkVanilla RNNIOHMM 1IOHMM 2Nonlinear SVMLogistic Regression
Week1 2 3 4 5
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF2)
Week1 2 3 4 5
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF3)
AUC scores of all models for Coursera course
Week1 2 3 4 5 6 7 8 9
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF1)
LSTM NetworkVanilla RNNIOHMM 1IOHMM 2Nonlinear SVMLogistic Regression
Week1 2 3 4 5 6 7 8 9
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF2)
Week1 2 3 4 5 6 7 8 9
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF3)
AUC scores of all models for edX course
1 LSTM network performs consistently best
2 IOHMMs performance worst
3 Baselines ' vanilla RNN; Not consistent on two datasets
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 16 / 17
Model Performance Comparison
Week1 2 3 4 5
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF1)
LSTM NetworkVanilla RNNIOHMM 1IOHMM 2Nonlinear SVMLogistic Regression
Week1 2 3 4 5
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF2)
Week1 2 3 4 5
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF3)
AUC scores of all models for Coursera course
Week1 2 3 4 5 6 7 8 9
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF1)
LSTM NetworkVanilla RNNIOHMM 1IOHMM 2Nonlinear SVMLogistic Regression
Week1 2 3 4 5 6 7 8 9
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF2)
Week1 2 3 4 5 6 7 8 9
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF3)
AUC scores of all models for edX course
1 LSTM network performs consistently best
2 IOHMMs performance worst
3 Baselines ' vanilla RNN; Not consistent on two datasets
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 16 / 17
Model Performance Comparison
Week1 2 3 4 5
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF1)
LSTM NetworkVanilla RNNIOHMM 1IOHMM 2Nonlinear SVMLogistic Regression
Week1 2 3 4 5
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF2)
Week1 2 3 4 5
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF3)
AUC scores of all models for Coursera course
Week1 2 3 4 5 6 7 8 9
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF1)
LSTM NetworkVanilla RNNIOHMM 1IOHMM 2Nonlinear SVMLogistic Regression
Week1 2 3 4 5 6 7 8 9
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF2)
Week1 2 3 4 5 6 7 8 9
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF3)
AUC scores of all models for edX course
1 LSTM network performs consistently best
2 IOHMMs performance worst
3 Baselines ' vanilla RNN; Not consistent on two datasets
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 16 / 17
Model Performance Comparison
Week1 2 3 4 5
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF1)
LSTM NetworkVanilla RNNIOHMM 1IOHMM 2Nonlinear SVMLogistic Regression
Week1 2 3 4 5
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF2)
Week1 2 3 4 5
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF3)
AUC scores of all models for Coursera course
Week1 2 3 4 5 6 7 8 9
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF1)
LSTM NetworkVanilla RNNIOHMM 1IOHMM 2Nonlinear SVMLogistic Regression
Week1 2 3 4 5 6 7 8 9
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF2)
Week1 2 3 4 5 6 7 8 9
0.5
0.6
0.7
0.8
0.9
1Model Performance Comparison (DEF3)
AUC scores of all models for edX course
1 LSTM network performs consistently best
2 IOHMMs performance worst
3 Baselines ' vanilla RNN; Not consistent on two datasets
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 16 / 17
Outline
1 Background and Motivation
2 Temporal Models
3 Experiments
4 Conclusion
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 17 / 17
Takehome Message
1 Temporal perspective to dropout prediction problem
2 The e↵ectiveness of RNN and LSTM network
3 Try not “dropout” the MOOC courses you are taking
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 17 / 17
Takehome Message
1 Temporal perspective to dropout prediction problem
2 The e↵ectiveness of RNN and LSTM network
3 Try not “dropout” the MOOC courses you are taking
Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 17 / 17