Temporal Models for Predicting Student Dropout in Massive...

Transcript of Temporal Models for Predicting Student Dropout in Massive...

Temporal Models for Predicting Student Dropout inMassive Open Online Courses

Fei Mi, Dit-Yan Yeung

Hong Kong University of Science and Technology (HKUST)

[email protected] ([email protected])

November, 14th, 2015

Fei Mi, Dit-Yan Yeung (HKUST) ICDM ASSESS 2015 November, 14th, 2015 1 / 17

Page 2: Temporal Models for Predicting Student Dropout in Massive ...mi/upload/doc/publication/2015/presentation.pdf · Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology

Outline

1 Background and Motivation

2 Temporal Models

3 Experiments

4 Conclusion

Page 3: Temporal Models for Predicting Student Dropout in Massive ...mi/upload/doc/publication/2015/presentation.pdf · Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology

Outline

2 Temporal Models

3 Experiments

4 Conclusion

Page 4: Temporal Models for Predicting Student Dropout in Massive ...mi/upload/doc/publication/2015/presentation.pdf · Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology

Overview

1 What can we do?Performance evaluation (Peer Grading)Help students engage and perform better (Dropout Prediction)Build personalized platform (Recommendation)

Page 5: Temporal Models for Predicting Student Dropout in Massive ...mi/upload/doc/publication/2015/presentation.pdf · Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology

Overview

1 What can we do?

Performance evaluation (Peer Grading)Help students engage and perform better (Dropout Prediction)Build personalized platform (Recommendation)

Page 6: Temporal Models for Predicting Student Dropout in Massive ...mi/upload/doc/publication/2015/presentation.pdf · Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology

Overview

1 What can we do?Performance evaluation (Peer Grading)

Help students engage and perform better (Dropout Prediction)Build personalized platform (Recommendation)

Page 7: Temporal Models for Predicting Student Dropout in Massive ...mi/upload/doc/publication/2015/presentation.pdf · Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology

Overview

1 What can we do?Performance evaluation (Peer Grading)Help students engage and perform better (Dropout Prediction)

Build personalized platform (Recommendation)

Page 8: Temporal Models for Predicting Student Dropout in Massive ...mi/upload/doc/publication/2015/presentation.pdf · Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology

Overview

Page 9: Temporal Models for Predicting Student Dropout in Massive ...mi/upload/doc/publication/2015/presentation.pdf · Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology

Overview

Page 10: Temporal Models for Predicting Student Dropout in Massive ...mi/upload/doc/publication/2015/presentation.pdf · Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology

Motivation of our work

1 High attrition rate commonly on MOOC platforms (60%� 80%)

2 Current methods: SVM, Logistic RegressionActivity features (lecture video, discussion forum)Static models

Page 11: Temporal Models for Predicting Student Dropout in Massive ...mi/upload/doc/publication/2015/presentation.pdf · Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology

Motivation of our work

1 High attrition rate commonly on MOOC platforms (60%� 80%)

2 Current methods: SVM, Logistic RegressionActivity features (lecture video, discussion forum)Static models

Page 12: Temporal Models for Predicting Student Dropout in Massive ...mi/upload/doc/publication/2015/presentation.pdf · Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology

Contribution of our work

1 A sequence labeling perspective

Week 1 Week 2 Week 3 Week 4 Week t

𝒙1 𝒙2 𝒙3 𝒙4 𝒙𝑡

𝑦1 𝑦2 𝑦3 𝑦4 𝑦𝑡 Labels

Activities

2 Compare di↵erent temporal machine learning modelsInput-output Hidden Markov Model (IOHMM)Recurrent Neural Network (RNN)RNN with long short-term memory (LSTM) cells

Page 13: Temporal Models for Predicting Student Dropout in Massive ...mi/upload/doc/publication/2015/presentation.pdf · Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology

Activities

2 Compare di↵erent temporal machine learning models

Input-output Hidden Markov Model (IOHMM)Recurrent Neural Network (RNN)RNN with long short-term memory (LSTM) cells

Page 14: Temporal Models for Predicting Student Dropout in Massive ...mi/upload/doc/publication/2015/presentation.pdf · Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology

Activities

2 Compare di↵erent temporal machine learning modelsInput-output Hidden Markov Model (IOHMM)Recurrent Neural Network (RNN)RNN with long short-term memory (LSTM) cells

Page 15: Temporal Models for Predicting Student Dropout in Massive ...mi/upload/doc/publication/2015/presentation.pdf · Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology

Outline

2 Temporal Models

3 Experiments

4 Conclusion

Page 16: Temporal Models for Predicting Student Dropout in Massive ...mi/upload/doc/publication/2015/presentation.pdf · Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology

How to capture temporal information?

Sliding window structures (NLP tasks):

1 Features aggregated using sliding window structure

2 Temporal span fixed by sliding window

Temporal models:

1 Learn from the previous inputs and the current input

2 Temporal pathway allows a “memory” of the previous inputs topersist in the internal state

3 Flexible temporal span, learn from data

Page 17: Temporal Models for Predicting Student Dropout in Massive ...mi/upload/doc/publication/2015/presentation.pdf · Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology

How to capture temporal information?

Sliding window structures (NLP tasks):

1 Features aggregated using sliding window structure

2 Temporal span fixed by sliding window

Temporal models:

1 Learn from the previous inputs and the current input

2 Temporal pathway allows a “memory” of the previous inputs topersist in the internal state

3 Flexible temporal span, learn from data

Page 18: Temporal Models for Predicting Student Dropout in Massive ...mi/upload/doc/publication/2015/presentation.pdf · Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology

Input-output Hidden Markov Model (IOHMM)

Originated from HMMLearn to map input sequences to output sequences

ht = Aht�1

+ Bxt +N (0,Q)

y

t

= Cht +N (0,R)(1)

IOHMM 1

Page 19: Temporal Models for Predicting Student Dropout in Massive ...mi/upload/doc/publication/2015/presentation.pdf · Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology

Input-output Hidden Markov Model (IOHMM)

Originated from HMMLearn to map input sequences to output sequences

ht = Aht�1

+ Bxt +N (0,Q)

y

t

= Cht +N (0,R)(1)

!" !"#$

%&%&'(

!"'$

)&'( )& )&#(

Hidden&states

Dropout labels

Input&features

%&#(

…

IOHMM 1

Page 20: Temporal Models for Predicting Student Dropout in Massive ...mi/upload/doc/publication/2015/presentation.pdf · Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology

Vanilla Recurrent Neural Network (Vanilla RNN)

RNN allows the network connections to form cycles.

ht = H(W1

xt + W

2

ht�1

+ b

h

)

y

t

= F(W3

ht + b

y

)(2)

Left: Vanilla RNN structure; Right: Vanilla RNN unfolded

Page 21: Temporal Models for Predicting Student Dropout in Massive ...mi/upload/doc/publication/2015/presentation.pdf · Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology

Drawbacks of RNN

1 Influence of an input either decays or blows up as it cycles therecurrent connection

2 Vanishing gradient problem

3 The range of temporality that can be accessed in practice is usuallyquite limited

4 Dynamic state of regular RNN is short-term memory

Page 22: Temporal Models for Predicting Student Dropout in Massive ...mi/upload/doc/publication/2015/presentation.pdf · Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology

Drawbacks of RNN

Page 23: Temporal Models for Predicting Student Dropout in Massive ...mi/upload/doc/publication/2015/presentation.pdf · Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology

Drawbacks of RNN

Page 24: Temporal Models for Predicting Student Dropout in Massive ...mi/upload/doc/publication/2015/presentation.pdf · Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology

Long Short-Term Memory Cell (LSTM)

1 Hochreiter & Schimidhuber (1997)solved the problem of getting anRNN to remember things for a longtime.

m n

1 Information get into a cellwhenever the “input” gate is on

2 Information stays in the cell solong as the “forget” gate isclosed

3 Information can read from thecell by turning the “output”gate on

Page 25: Temporal Models for Predicting Student Dropout in Massive ...mi/upload/doc/publication/2015/presentation.pdf · Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology

Update Functions of LSTM

m n

i

t

= �(Wxi

x

t

+W

hi

h

t�1 +W

ci

c

t�1 + b

i

)

f

t

= �(Wxf

x

t

+W

hf

h

t�1 +W

cf

c

t�1 + b

f

)

c

t

= f

t

⌦ c

t�1 + i

t

⌦ tanh(Wxc

x

t

+W

hc

h

t�1 + b

c

)

o

t

= �(Wxo

x

t

+W

ho

h

t�1 +W

co

c

t�1 + b

o

)

h

t

= o

t

⌦ tanh(ct

)

(3)

Page 26: Temporal Models for Predicting Student Dropout in Massive ...mi/upload/doc/publication/2015/presentation.pdf · Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology

Hybrid of LSTM Memory Cells and RNN (LSTM Network)

…

……

…

Left: Hybrid of LSTM and RNN (LSTM network); Right: LSTM networkunfolded

Page 27: Temporal Models for Predicting Student Dropout in Massive ...mi/upload/doc/publication/2015/presentation.pdf · Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology

Outline

2 Temporal Models

3 Experiments

4 Conclusion

Page 28: Temporal Models for Predicting Student Dropout in Massive ...mi/upload/doc/publication/2015/presentation.pdf · Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology

Datasets for Dropout Prediction

1 “Science of Gastronomy”, six-week course (Coursera).

2 85394 ! 39877

1 “Introduction to Java Programming”, ten-week course (edX).

2 46972 ! 27629

Page 29: Temporal Models for Predicting Student Dropout in Massive ...mi/upload/doc/publication/2015/presentation.pdf · Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology

Datasets for Dropout Prediction

1 “Science of Gastronomy”, six-week course (Coursera).

2 85394 ! 39877

1 “Introduction to Java Programming”, ten-week course (edX).

2 46972 ! 27629

Page 30: Temporal Models for Predicting Student Dropout in Massive ...mi/upload/doc/publication/2015/presentation.pdf · Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology

Dropout Definitions

1 Three definitions capture di↵erent contexts of the student status in a course

DEF1 Participation in the final week: whether a student will stayto the end of the course [Yang et al.2013, Ramesh et al.2014, He et al.2015]

DEF2 Last week of engagement: whether the current week is thelast week the student has activities [Amnueypornsakul et al.2014,Kloft et al.2014, Sinha et al.2014, Sharkey and Sanders2014, Taylor et al.2014]

DEF3 Participation in the next week: whether a student hasactivities in the comming week

Three dropout definitions

Time Week 1 Week 2 Week 3 Week 4 Week 5

Features [7,34,9,2,0,7,5] Zeros [6,3,12,4,1,8,3] Zeros Zeros

DEF1 1 1 1 1 1DEF2 0 0 1 1 nullDEF3 1 0 1 1 null

An illustrative example for DEF1-DEF3

Page 31: Temporal Models for Predicting Student Dropout in Massive ...mi/upload/doc/publication/2015/presentation.pdf · Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology

Dropout Definitions

1 Three definitions capture di↵erent contexts of the student status in a course

DEF1 Participation in the final week: whether a student will stayto the end of the course [Yang et al.2013, Ramesh et al.2014, He et al.2015]

DEF2 Last week of engagement: whether the current week is thelast week the student has activities [Amnueypornsakul et al.2014,Kloft et al.2014, Sinha et al.2014, Sharkey and Sanders2014, Taylor et al.2014]

DEF3 Participation in the next week: whether a student hasactivities in the comming week

Three dropout definitions

Time Week 1 Week 2 Week 3 Week 4 Week 5

Features [7,34,9,2,0,7,5] Zeros [6,3,12,4,1,8,3] Zeros Zeros

DEF1 1 1 1 1 1DEF2 0 0 1 1 nullDEF3 1 0 1 1 null

An illustrative example for DEF1-DEF3

Page 32: Temporal Models for Predicting Student Dropout in Massive ...mi/upload/doc/publication/2015/presentation.pdf · Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology

Model Performance Comparison

Week1 2 3 4 5

0.5

0.6

0.7

0.8

0.9

1Model Performance Comparison (DEF1)

LSTM NetworkVanilla RNNIOHMM 1IOHMM 2Nonlinear SVMLogistic Regression

Week1 2 3 4 5

0.5

0.6

0.7

0.8

0.9

Week1 2 3 4 5

0.5

0.6

0.7

0.8

0.9

AUC scores of all models for Coursera course

Week1 2 3 4 5 6 7 8 9

0.5

0.6

0.7

0.8

0.9

Week1 2 3 4 5 6 7 8 9

0.5

0.6

0.7

0.8

0.9

Week1 2 3 4 5 6 7 8 9

0.5

0.6

0.7

0.8

0.9

AUC scores of all models for edX course

1 LSTM network performs consistently best

2 IOHMMs performance worst

3 Baselines ' vanilla RNN; Not consistent on two datasets

Page 33: Temporal Models for Predicting Student Dropout in Massive ...mi/upload/doc/publication/2015/presentation.pdf · Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology

Week1 2 3 4 5

0.5

0.6

0.7

0.8

0.9

Week1 2 3 4 5

0.5

0.6

0.7

0.8

0.9

Week1 2 3 4 5

0.5

0.6

0.7

0.8

0.9

Week1 2 3 4 5 6 7 8 9

0.5

0.6

0.7

0.8

0.9

Week1 2 3 4 5 6 7 8 9

0.5

0.6

0.7

0.8

0.9

Week1 2 3 4 5 6 7 8 9

0.5

0.6

0.7

0.8

0.9

Page 34: Temporal Models for Predicting Student Dropout in Massive ...mi/upload/doc/publication/2015/presentation.pdf · Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology

Week1 2 3 4 5

0.5

0.6

0.7

0.8

0.9

Week1 2 3 4 5

0.5

0.6

0.7

0.8

0.9

Week1 2 3 4 5

0.5

0.6

0.7

0.8

0.9

Week1 2 3 4 5 6 7 8 9

0.5

0.6

0.7

0.8

0.9

Week1 2 3 4 5 6 7 8 9

0.5

0.6

0.7

0.8

0.9

Week1 2 3 4 5 6 7 8 9

0.5

0.6

0.7

0.8

0.9

Page 35: Temporal Models for Predicting Student Dropout in Massive ...mi/upload/doc/publication/2015/presentation.pdf · Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology

Week1 2 3 4 5

0.5

0.6

0.7

0.8

0.9

Week1 2 3 4 5

0.5

0.6

0.7

0.8

0.9

Week1 2 3 4 5

0.5

0.6

0.7

0.8

0.9

Week1 2 3 4 5 6 7 8 9

0.5

0.6

0.7

0.8

0.9

Week1 2 3 4 5 6 7 8 9

0.5

0.6

0.7

0.8

0.9

Week1 2 3 4 5 6 7 8 9

0.5

0.6

0.7

0.8

0.9

Page 36: Temporal Models for Predicting Student Dropout in Massive ...mi/upload/doc/publication/2015/presentation.pdf · Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology

Outline

2 Temporal Models

3 Experiments

4 Conclusion

Page 37: Temporal Models for Predicting Student Dropout in Massive ...mi/upload/doc/publication/2015/presentation.pdf · Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology

Takehome Message

1 Temporal perspective to dropout prediction problem

2 The e↵ectiveness of RNN and LSTM network

3 Try not “dropout” the MOOC courses you are taking

Page 38: Temporal Models for Predicting Student Dropout in Massive ...mi/upload/doc/publication/2015/presentation.pdf · Fei Mi, Dit-Yan Yeung Hong Kong University of Science and Technology

Takehome Message

1 Temporal perspective to dropout prediction problem

2 The e↵ectiveness of RNN and LSTM network

3 Try not “dropout” the MOOC courses you are taking

Temporal Models for Predicting Student Dropout in Massive...

Documents

Transcript of Temporal Models for Predicting Student Dropout in Massive...