Predicting Performance on MOOC Assessments using Multi ... · Predicting Performance on MOOC...

Post on 04-Aug-2020

14 views 0 download

Transcript of Predicting Performance on MOOC Assessments using Multi ... · Predicting Performance on MOOC...

Predicting Performance on MOOC Assessments using Multi-Regression Models

Zhiyun Ren, Huzefa Rangwala, Aditya Johri

George Mason University4400 University Drive, Fairfax, Virginia 22030

Outline

q Background

q Personal Linear Multi-Regression Models

q Feature selection

q Experiments and discussion

q Conclusion and future work

Background

Background

Overview

q Information we have: MOOC server log

q Things we want to do: Predict student’s performance

Challenge

q Various kinds of participants

q High attrition rate

q Flexible timetable

q Baselines we have tried: Linear regression model, meanscore

Personal Linear Multi-Regression Models

!",$ = &" + &( + )"*+,"$ = &" + &( + ()",. ,"$,/0.,/

12

/34)

6

.34

𝑝𝑠 𝑊 𝑓𝑠𝑎

𝒍 --Numberof regressionmodels

𝒏𝑭 -- Numberof features

!"#"!"$%((, *, +)

12/ (01,2 − 01,2)4

5

678+ :( * ;4 + ( ;4) + <( * + ( )

Data structure

(a) Homework and quiz

(b) Video

(c) Study session

Feature selection

q quiz related features

q time related features

q interval-based features

q homework related features

Feature selection

q Video related features

q Session features

Experimental setup

q Different motivations part the data into two groups.

q Different models are applied for different data types.

Experimental protocol

q PreviousHW-based prediction

q PreviousOneHW-based prediction

HW1 HW2 HW3 HW4 …...

HW1 HW2 HW3 HW4 …...

Experimental baseline: KT-IDEM

K K K

Q Q Q

P(L0)P(T) P(T)

P(G1)P(S1)

P(G2)P(S2)

P(G3)P(S3)

I I I

……

Model parameters

P(L0) = Initial KnowledgeP(T) = Probability of learningP(G1…n) = Probability of guess per questionP(S1…n) = Probability of slip per question

n denotes the number of all questions.

Comparative Performance

q Prediction results with varying number of regression models for student group with continuous grade value

Comparative Performance

q Prediction results with varying number of regression models for student group with binary grade value

Comparative Performance

q The comparison of the accuracy and F1 scores with baseline approaches.

Feature Importance

Feature Importance

Conclusion and future work

q Predict algorithm: personalized multiple linear regression model.

q Experimental results: improved performance compared to baseline methods.

q Other contribution: analysis of feature importance.

q Future work: to set up an early warning system to help improve student’s performance

Thank you!