Personalized Course Sequence RecommendationsFig. 1. System Diagram for Course Sequence...

arX

iv:1

512.

0917

6v2

[cs.

CY

] 12

Jan

201

61

Personalized Course Sequence RecommendationsJie Xu, Member, IEEE,Tianwei Xing, Student Member, IEEE,and Mihaela van der Schaar,Fellow, IEEE

Abstract—Given the variability in student learning it is be-coming increasingly important to tailor courses as well ascourse sequences to student needs. This paper presents a sys-tematic methodology for offering personalized course sequencerecommendations to students. First, a forward-search backward-induction algorithm is developed that can optimally selectcoursesequences to decrease the time required for a student to graduate.The algorithm accounts for prerequisite requirements (typicallypresent in higher level education) and course availability. Second,using the tools of multi-armed bandits, an algorithm is developedthat can optimally recommend a course sequence that bothreduces the time to graduate while also increasing the overallGPA of the student. The algorithm dynamically learns howstudents with different contextual backgrounds perform forgiven course sequences and then recommends an optimal coursesequence for new students. Using real-world student data fromthe UCLA Mechanical and Aerospace Engineering department,we illustrate how the proposed algorithms outperform othermethods that do not include student contextual informationwhenmaking course sequence recommendations.

I. I NTRODUCTION

Recent studies [1][2] find that the vast majority of collegestudents in the United States do not complete college in fouryears and that fewer college students are today graduating ontime than a decade ago. Taking longer to graduate is not cheap- it costs $15,933 more in tuition, fees and living expenses forevery extra year at a public two-year college and $22,826 forevery added year at a public four-year college [1]. While manyfactors contribute to students taking longer to graduate, suchas credits lost in transfer, uninformed choices due to the lowadvisor-student ratios and poor preparation for college, theinability of students to take required courses when needed isamong the leading causes [1]. If courses are elected and takenmyopically, without a clear plan, a student may end up in anawkward situation in which required subsequent courses areoffered (much) later, thereby (significantly) prolonging grad-uation time. To reduce the time-to-graduation, it is thereforeof paramount importance for the student to elect courses in aforesighted way by taking into account the possible subsequentcourse sequences (including which courses are mandatory andwhich ones are not, and the course prerequisites) and whenthe various courses are offered. More importantly, becausethenumber and variety (in backgrounds, in knowledge, in goals)of students is expanding rapidly, it is more and more importantto tailor course sequences to students since the same learningpath is unlikely to best serve all students. Therefore, it isnecessary to develop an automated course sequence recom-mendation system that learns from the performance of previous

J. Xu is with the Department of Electrical and Computer Engineering,University of Miami, Coral Gables, FL 33146, USA. Email: [email protected].

T. Xing and M. van der Schaar are with the Department of ElectricalEngineering, University of California, Los Angeles, CA 90095, USA. Emails:[email protected], [email protected].

students in various courses/sequences and uses what it haslearned to adaptively recommend course sequences that arepersonalized for the current student, depending on the student’sbackground and his/her completion status of the program inorder to maximize any of a variety of objectives includingtime to graduation, grades and the trade-off between the two.Issuing personalized course sequences recommendations forstudents poses numerous challenges, some of which are uniqueto this type of recommendation system:

• SequencesUnlike most existing recommendation sys-tems (such as those used to recommend movies orproducts to purchase), course sequence recommendationsrequires issuing sequences of courses (items) rather thana single item at a time. Hence, such course sequence rec-ommendations require dealing with a large decision spacewhich grows combinatorially with the number of courses.Searching for the best sequence to make personalizedrecommendations to a student represents a challengingproblem even when the number of courses is moderate.When the number of offered courses is large (as it istypical in academic programs at the undergraduate orgraduate level), offering personalized recommendationsbecomes a very challenge problem.

• Flexibility and Constraints There is a great deal of flex-ibility in course sequence recommendation since multiplecourses can be taken simultaneously. At the same time,course sequence recommendation is also subject to manyconstraints - some courses are mandatory while some arenot, and some courses are prerequisite of others. Boththe flexibility and the constraints make course sequencerecommendation an extremely complex problem.

• Evolving RecommendationAny static course sequenceis sub-optimal since the knowledge, experience and per-formance of a student develops and evolves in the processof learning. Students may arrive at different completionstatus of the academic program at a given time dependingon their performance in the finished courses and hence,they should be given different recommendations of sub-sequent courses.

• PersonalizationAn even more difficult challenge is thatstudents vary tremendously in backgrounds, knowledgeand goals. As a result, the same recommendation policyis unlikely to best serve all students and provide thebest learning experience for every student. However,personalizing course sequence recommendation seems adaunting task since students attend college only once andhence, it is unrealistic to try different course sequenceson a student and learn the best course sequence for thisstudent.

http://arxiv.org/abs/1512.09176v2

2

!!"

!!"

#$%&'()*"

+ , - .

/

012314 5647 854963:1 ;<= 0>>2314 5647 854963:1 ;;;=

?@A@B@CD E@BDFGHFIJKLMKG@HA?KN@B@CD IJKLDO

Fig. 1. System Diagram for Course Sequence Recommendation

In this paper, we develop an automated course sequencerecommendation system that is able to provide personalizedand adaptive recommendation to students depending on theirbackground as well as their evolving performance in theprogram. In order to reduce complexity and enable tractablesolutions, we solve this problem in two steps. The first step(Section III) involves offline learning, in which a set of can-didate recommendation policies are determined to minimizethe expected time to graduation or maximize the on-timegraduation probability using an existing dataset of anonymizedstudent records. A dynamic programming based approachis adopted to solve the adaptive sequence recommendationwhich recommends subsequent course sequences to studentsdepending on their completion status of the academic programby taking into account the prerequisite relationship amongcourses and the course availability across academic terms(quarters/semesters). The second step (Section IV) involvesonline learning, in which for each new student, a suitablecourse sequence recommendation policy is selected dependingon this student’s background using the learned knowledgefrom the previous students. Online contextual multi-armedbandit techniques are used to develop policy selection algo-rithms to maximize the students’ grades given the time-to-completion constraints. To enable fast learning, the algorithmexploits the similarity among students and adaptively clustersstudents and refines the clustering as more students enter andfinish the program. See Figure 1 for a depiction of the system.

The main contributions of this paper are as follows:

• We formulate the personalized and adaptive course se-quence recommendation problem and develop systematicsolutions aimed at reducing time to graduation and max-imizing students’ grades upon graduation.

• We provide analytical characterizations on the impact ofcourse prerequisite dependency and the course availabil-ity on the possible emerging subsequent course sequencesas well the structure of the optimal course sequencerecommendation policy in certain specific scenarios.

• We rigorously analyze the performance of the onlinepersonalized policy selection algorithm and prove that theproposed algorithm converges fast and is able to select theoptimal personalized course sequence recommendationpolicy for students.

• Extensive simulations are carried out on a real-world stu-

dent record dataset to verify the efficacy of the proposedsystem. They also reveal how the time-to-graduation isaffected by the course prerequisite dependency as wellas the course availability across academic terms, therebyproviding important insights and guidelines on how toplan the curriculum and allocate teaching resources toimprove the on-time graduation ratio.

The rest of this paper is organized as follows. In SectionII, we review the literature and highlight our contribution. InSection III, we study the offline learning problem to determinea set of candidate course sequence recommendation policiesbased on dynamic programming. In Section IV, we studythe online learning problem to select personalized policiesfor students based on online contextual multi-armed banditstechniques. Section V provides simulation results. In SectionVI, we conclude the paper.

II. RELATED WORK

Machine learning for education has recently gained muchattention [3][4]. Previous research focuses on grade prediction[5], drop-out prediction [6], personalized teaching styles andmaterials [7], estimating learners’ knowledge of conceptsunderlying a domain [8], multimedia and cooperative learning[9] etc. This paper studies the important, yet much lessinvestigated problem of (personalized) course sequence recom-mendation. Solving this problem has the potentially significantimpact of shortening the time that students need to graduate.Methods solving this problem can then be combined withother methods to provide a comprehensive set of tools forpersonalizing education.

There is much work on recommending relevantcourses/learning materials to students according to students’types (e.g. interests, knowledge levels, learning styles andfeedback) [10] [11] [12] [13] [14] [15] [16]. Besidescourse recommendation, there is extensive work onrecommender systems for assisting users with findingdesirable products or services [17] [18] [19]. However,several unique features of course sequence recommendationmake these approaches unsuitable for the considered problem.First, while traditional recommendation systems deal withthe problem of recommending items or sets of items,most of them do not take into account prerequisites whilerecommending an item: a course can be taken only when allits prerequisite courses have been taken and passed. Thus, itdoes not make sense to recommend to a student a course ifthe prerequisite courses have not been completed. Secondly,there are complex constraints on the recommendation: thecourses taken by a student must satisfy requirements (e.g.take 10 mandatory courses and 5 out of 12 elective courses)in order for the student to graduate. Thirdly, courses are notavailable in all quarters due to teaching resource constraints.If the next available quarter of a required course is far awayfrom the current quarter, then it might be wiser to take thiscourse earlier rather than later.

Recommendation with prerequisites was studied in [20],in which the goal is to recommend the best set ofk itemswhen there is an inherent ordering between items. Various

3

prerequisite structures were studied and the complexity ofdetermining the best set is proven to be NP-Hard. Severalheuristic approximation algorithms were developed to solvethe recommendation problem. However, the problem is for-mulated as a set recommendation problem rather than asequential recommendation problem, which ignores the courseprerequites, the evolving knowledge of students (and gradesso far) as well as the course availability in different quarters.Recommendation with complex constraints was studied in [21]where increasingly expressive models were developed to checkif the requirements are satisfied and course recommendationswere made by taking into account these requirements. How-ever, the course prerequisites and the course availabilityare notconsidered. A Markov Decision Process based recommendersystem was developed in [22] to take into the long-term effectsof each recommendation. However, this approach is not ableto handle the course prerequisites or the course requirementconstraints.

Our algorithm for online personalized recommendation pol-icy selection builds on the contextual multi-armed banditsmethods [23] [24] [25] [26]. Most of the prior work oncontextual bandits is focused on an agent making single-stage decisions based on the provided context information.In contrast, in this paper, arms are the course sequencerecommendation policies which are selected depending onthe student’s background but the policy itself also inducesa sequence of decision making depending on the evolvingperformance of the student.

III. C OURSESEQUENCERECOMMENDATION: POLICY

CONSTRUCTION

We consider a curriculum consisting of a set of coursesN =1, 2, ..., N. Among these courses, there areM mandatorycourses andE = N − M elective courses. We consider adiscrete time system where a student takes courses quarter byquarter1 and can stay in the program for at mostT quarters.Quarters are indexed byt = 1, 2, ..., T . Let s(t) be a coursestate vector of sizeN which is used to indicate the courses thata student has already taken and passed by the end of quartert. The first M elements are with respect to the mandatorycourses and the remainingE elements are with respect to theelective courses. Each element ofs(t) takes a binary valuewheresn(t) = 1 means that the student has taken and passedcoursen andsn(t) = 0 otherwise. Initially each student passeszero courses such that each element ofs(0) satisfiessn(0) =0, ∀n.

In each quarter, the maximum number of courses a studentcan take isC. Let A(t) denote the elected courses in quartert. However, even though the student is free to elect courses,A(t) must come from a set of feasible coursesF(t, s(t− 1))depending on the indext of the quarter as well as the coursestates(t− 1) of the student. Firstly, courses that have alreadybeen taken and passed cannot be retaken. Secondly, the studentcan only take courses that are available in that quarter since acourse is usually not offered in every quarter. LetΓ(t) ⊆ N

1We use the quarter system for illustration but our approach also works forthe semester system.

PQRS

TUQ

PQRS

TVQ

PQRS

TUW

PQRS

TVW

PQRS

TTQ

PQRS

TTW

XSY

UW

XSY

UQ

PQZ

U[T

PQZ

U[V

PQZ

U[U

XSY

U\

PQZ

U[]Q

\SZ

V[W

XSY

^Q_

XSY

^W_

\`

TU

PQZ

a^

P`Z

U[^

\SZ

V[

Fig. 2. The prerequisite graph for the undergraduate program in the Mechan-ical and Aerospace Engineering department at UCLA.

denote the set of available courses offered in quartert. Thirdly,the student can only take a course when he/she has finishedall its prerequisite courses. We formalize course prerequisitein more detail as follows.

Course PrerequisiteCourses have prerequisite dependen-cies, namely courses can be elected only when certain pre-requisite courses have been taken and passed. In general, theprerequisite dependency can be described as a directed acyclicgraph (DAG), denoted byG = 〈N , E〉 whereN is the set ofcourses andE is the set of directed edges. A directed edgem→ n between two coursesm andn means that coursem is aprerequisite of coursen. LetP (n) = m : m→ n ∈ E be theset of prerequisite courses ofn. Only when all courses inP (n)have been taken can coursen be elected. Note that ifP (n) isan empty set, then coursen has no prerequisite courses andhence can be elected at any time (whenever available). More-over, any elective course cannot be the prerequisite courseofa mandatory course. Otherwise, the elective course effectivelybecomes mandatory. Nevertheless, an elective course can bethe prerequisite of another elective course. Figure 2 illustratespart of the prerequisite graph for the undergraduate programin the Mechanical and Aerospace Engineering department atUCLA.

Given these constraints, the feasible set of courses that astudent can take in quartert given his/her course states(t−1)can be computed as follows:

F(t, s(t− 1)) = n :sn(t− 1) = 0;n ∈ Γ(t);

∀m ∈ P (n), sm(t− 1) = 1 (1)

Since a student cannot take more thanC courses per quarter,the possible combinations of courses that can be elected by astudent is

A(t, s(t− 1)) = A : A ⊆ F(t, s(t− 1)); |A| ≤ C (2)

At the end of each quartert, the student either passes or failsthe course that he/she takes in this quarter. The probability thata student fails a course depends on the difficulty of the courseas well as how many courses that he/she is taking simulta-neously in the same quarter, which can be estimated fromthe student academic record dataset. Denote the probabilitythat the student fails a coursen by ǫn(k) where k is thenumber of simultaneous courses. Typically,ǫn(k) is a non-decreasing function ink to capture the fact that the student’seffort has to be distributed into multiple courses. Dependingon the course performance outcome in this quarter, the course

4

state will evolve froms(t− 1) to s(t). If the student passes acoursen ∈ A(t), thensn(t) = 1; otherwise,sn(t) remains 0.

A student graduates when he/she has taken and passed allmandatory courses and at leastE0 ≤ E elective coursesbefore the end ofT quarters whereE0 is a predefinednumber by the program. The course states in which thestudent can graduate are called terminal states, which must

satisfy ∀n = 1, ...,M, sn = 1 andN∑

n=M+1

sn ≥ E0. Let S

be the set of all terminal course states. There is a rewardfunction U : S × 1, ..., T → R for each terminal stateindicating the reward of reaching the terminal state by aspecific quarter. For example,U(s, t) = 1, ∀s ∈ S, ∀t assignsequal value to all terminal states if the system only caresabout whether the student can graduate on time. For anotherexample,U(s, t) = T − t + 1, ∀s ∈ S allows the system totake into account the exact time of graduation.

A course sequence recommendation policy specifies foreach course state in any quarter, the next courses that shouldbe taken. Letπ(s, t) denote the courses that are recommendedto take in quartert given the course states. Given a coursesequence recommendation policyπ, starting with any states inany quartert, the course states evolves stochastically (since astudent may pass or fail the course with probabilities), therebyinducing a probability distribution over the terminal state thatcan be reached. LetV (s, t) =

∑

s,τ≥t

pπs,τ (s, t)U(s, τ) denote

the value of states in quarter t when policy π is adoptedwhere pπs,τ (s, t) is the probability of reaching a terminalstate s in quarter τ ≥ t starting with states in quartert. The objective of the system is to determine the optimalpolicy that maximizes the value of the initial states(0), i.e.π∗ = argmaxπ V (s(0), 1).

Our solution to find the optimal policyπ∗ consists of twophases. In the first phase, we perform a forward search startingfrom quarter 1 through quarterT to determine all possiblecourse states that can emerge on the learning path. The purposeof this phase is to reduce the course state space in eachquarter that the system should look at. In the second phase, weperform a backward induction starting from quarterT throughquarter1 to compute the optimal set of courses that shouldbe taken in each possible course state. The purpose of thisphase is to determine the course sequence recommendationthat minimizes the graduation time (or ensure that studentsgraduate before a desired time).

A. Forward Search Phase

Since the number of possible course states grows exponen-tially with the number of courses in the curriculum, the coursestate space can be huge for even a moderate number of courses.However, thanks to the course prerequisite constraint and thecourse availability constraint, the number of course states thatcan emerge in a particular quarter can be significantly limited.The purpose of the forward search is to determine the possiblecourse states, thereby reducing the problem complexity.

Let L(t) denote the set of possible course states by theend of quartert andH(t) denote the set of state-course pairsin quartert. Initially L(0) = s(0). In each quartert, the

(0)L (1)L (2)L(1)H (2)H

Fig. 3. Illustration for Forward Search. Each course state (circle) representsthe completion status of the three courses. For instance, 100 means thatonly the first course is taken and passed. Each state-course pair (rectangle)represents the next courses elected in a given state. For instance, 100/2 meansthat course 2 is elected as the next course to take in a state 100.

algorithm examines each non-terminal course states(t− 1) ∈L(t− 1) and determines the feasible course setF(t, s(t− 1))for this course state hence the possible combinations of courseA(t, s(t − 1)) that can be elected in this quarter. For eachcombination of coursesA ∈ A(t, s(t − 1)), the state-coursepair (s(t− 1), A) is inserted intoH(t). Then all possible newcourse statess(t) with respect to(s(t− 1), A) is included inL(t). Moreover, the probability thats(t− 1) transits tos(t) iscomputed by

p(s(t)|s(t− 1), A)

=∏

n:n∈A,sn(t)=1

(1− ǫn(|A|))∏

n:n∈A,sn(t)=0

ǫn(|A|) (3)

This algorithm is summarized in Algorithm 1 and is illustratedin Figure 2.

Algorithm 1 Forward Search

1: Initialization : L(t) = ∅,H(t) = ∅, ∀t.2: Initial possible course statesL(0) = s(0) = sn = 0, ∀n3: For quartert = 1 To quarterT :4: For each course states ∈ L(t− 1)5: Determine feasible course setF(t, s) and feasi-

ble courses-to-take setA(t, s)6: For each feasible combinations of coursesA ∈A(t, s)

7: Update the current listH(t) ← H(t) ∪(s, A)

8: UpdateL(t) by adding all possible states9: End For

10: End For11: End For

Next, we analyze the property of the setL(t) of possiblecourse states.

Lemma 1. For any course prerequisite and course availabilityconstraints,L(t) is weakly expanding, i.e.L(0) ⊆ L(1)... ⊆L(T ).

5

Proof. Consider any course states ∈ L(t). Since the studentcan take no course in the subsequent quartert+1 and hence,the course state remains the same. Therefore,L(t + 1) mustat least includes and hence,L(t) ⊆ L(t+ 1).

Lemma 1 states an intuitive result that the possible coursestates grow over quarters. However, how fast this set growsdepend on the specific course prerequisite dependency as wellas the course availability. A formal characterization for ageneral DAG seems extremely complicated; in the propositionbelow, we determine the growth rate ofL(t) for two specificcases.

Proposition 1. Suppose all courses are offered in all quar-ters andC = 1. (1) If the course prerequisite DAG is aline, then |L(t)| = t. (2) If the course prerequisite DAGis an empty graph (i.e. no prerequisite dependency), then

|L(t)| =t∑

τ=0

(

Nτ

)

. (3) For any general prerequisite DAG,

t ≤ |L(t)| ≤t∑

τ=0

(

Nτ

)

.

Proof. (1) By any quartert, the student can take and passat most t courses. The only possible course states that canemerge in quartert is 1, 1, 2, ..., 1, 2, ..., t. Therefore,there are totallyt possible course states.

(2) In quarter 1, the student can take any one of theNcourses or does not take any course. The number of possiblestates is thusN + 1. If this student passes this course, he/shecan take any one of the remainingN − 1 courses or doesnot take any course in quarter 2. The possible states with twocourses passed is

(

N2

)

+ N + 1. Continuing in this way, weobtain the result.

(3) It is easy to see that (1) and (2) are two extreme casesof a general DAG, therefore, the number of possible states is

bounded byt andt∑

τ=0

(

Nτ

)

.

Proposition 1 shows that the possible course states highlydepend on the course prerequisite constraints. The number ofpossible course states grows linearly when the prerequisite isstrict while it grows exponentially when the prerequisite isloose. Note that in the above proposition we did not imposeany restriction on the course availability. In practice, only alimited number of courses are offered in each quarter, therebyfurther limiting the size of|L(t)|.

Remark: L(t) contains all possible course completion statesby the end of quartert if students randomly choose their coursesequences. Therefore,L(t) reflects how diverse the studentslearning experience can be by adopting a specific curriculum.The largerL(t), the more diverse learning experience thatstudents can have by the end of quartert. A more diversestudent learning experience (which is determined by the cur-riculum but not the course sequence recommendation) has twoimplications. On one hand, finding the best course sequence torecommend becomes more difficult since the searching spaceis larger. On the other hand, the best course sequence mayyield better learning outcome.

(0)L (1)L (2)L(1)H (2)H

Fig. 4. Illustration for Backward Induction. The red thick arrow representsthe next course elected for each course state.

B. Backward Induction Phase

The outcome of the Forward Search phase is actually anAND/OR graph where each course subsequence is a subgraphof the AND/OR graph. In this graph, each OR node representsa course states ∈ L(t) in which different possible combina-tions of subsequence courses can be elected. Each AND node(s, A) corresponds to electing coursesA in course states.The AND node also stores a probability distribution over thepossible next states by taking these courses, which is computedby (3). The value of the optimal course sequence recommenda-tion for this AND/OR graph can be computed by a bottom-upsweep through the graph. This computation can be viewed asa backward induction. First, the value of all OR nodes thatare non-terminal states and all AND nodes are initialized to0, and the value of all OR nodes that are terminal states areinitialized to the reward of the corresponding terminal states.Next, starting from quarterT , for each quartert, we updatethe value of the AND nodes inH(t) using the value of theOR nodes inL(t), i.e. ∀(s, A) ∈ H(t),

Q(s, t, A) =∑

s′∈L(t)

p(s′|s, A)V (s′, t) (4)

We then update the value of the OR nodes inL(t− 1) usingthe value of the AND nodes inH(t), i.e. ∀s ∈ L(t− 1),

V (s, t− 1) = maxA

Q(s, t, A) (5)

and we record the combinations of courses in the courserecommendation policy

π∗(s, t− 1) = argmaxA

Q(s, t, A) (6)

As mentioned, depending on the choice of the reward functionfor the terminal course states, different system objectives canbe achieved by solving the above problem. The algorithm issummarized in Algorithm 2 and illustrated in Figure 4.

Next, we analyze the property of the value functionV (s, t)and the structure of the optimal policyπ∗. We says ≺ s ifsn = 1 implies sn = 1. That is, all courses that are passed ins are also passed ins.

6

Algorithm 2 Backward Induction

1: Initialization : ∀s 6∈ s, V (s, t) = 0, Q(s, t, A) = 0; ∀s ∈s, V (s, t) = U(s, t).

2: For quartert = T To quarter13: For each state and courses-to-take pairs, A ∈ H(t)4: UpdateQ(s, t, A) =

∑

s′∈L(t)

p(s′|s, A)V (s′, t)

5: End For6: For each course states ∈ L(t− 1)7: Update value function V (s, t − 1) =

maxA Q(s, t, A)8: Update policyπ(s, t−1) = argmaxA Q(s, t, A)9: End For

10: End For

Proposition 2. For any t, s and s, if s ≺ s, thenV (s, t) ≥V (s, t).

Proof. Let A∗ be the optimal courses fors, i.e. V (s(t), t) =Q(s(t), t + 1, A∗). To proveV (s(t), t) ≥ V (s(t), t), we willprove for anyA∗, we can findA ∈ A(t + 1, s(t)) such thatQ(s(t), t + 1, A) ≥ Q(s(t), t + 1, A∗). SinceV (s(t), t) ≥Q(s(t), t+1, A), we will getV (s, t) ≥ V (s, t). The proof usesinduction. It is straightforward to see thatV (s, T ) ≥ V (s, T )since if s is a terminal state,s must also be a terminal state.Suppose the claim holds for allt+ 1, ..., T .

Case 1:A∗ ∈ A(t + 1, s(t)). In this case, we letA = A∗,namely we select the same set of courses for the student totake. Since the courses taken are the same, the probabilityto pass each of these courses is the same. For each possiblenext states′(t+1), there must exist a corresponding next states′(t+1) such thatp(s′(t+1)|s(t), A∗) = p(s′(t+1)|s(t), A∗).Moreover,s′(t+1) ≺ s′(t+1). By induction, we haveV (s′(t+1), t+ 1) ≤ V (s′(t+ 1), t+ 1). Therefore,

Q(s(t), t+ 1, A∗)

=∑

s′(t+1)

p(s′(t+ 1)|s(t+ 1), A∗)V (s′(t+ 1), t+ 1)

=∑

s′(t+1)

p(s′(t+ 1)|s(t), A∗)V (s′(t+ 1), t+ 1)

≤∑

s′(t+1)

p(s′(t+ 1)|s(t), A∗)V (s′(t+ 1), t+ 1)

=Q(s(t), t+ 1, A∗) (7)

Case 2:A∗ 6∈ A(t+1, s(t)). In this case, we letA to be thelargest subset ofA∗ that belongs toA(t+1, s(t)). Moreover,since s ≺ s, the remaining subset ofA∗ must be coursesthat have already been passed ins. Suppose that courses inA∗−A are passed with probability 1 starting froms(t) and theremaining courses are passed with probability1 − ǫn(|A|) >1 − ǫn(|A∗|). Due to induction, such relaxation provides anupper bound onQ(s(t), t + 1, A∗). Moreover, this relaxationis the same as

∑

s′(t+1)

p(s′(t+ 1)|s(t), A∗)V (s′(t+ 1), t+ 1)

=Q(s(t), t+ 1, A) (8)

Thus,Q(s(t), t+ 1, A∗) ≤ Q(s(t), t+ 1, A).

Proposition 2 implies that a course state where a larger setof courses have been passed has a higher value since there ismore flexibility in choosing subsequent course sequences. Aquestion naturally arises that is it always better to take asmanycourses as possible in any quarter? The answer turns out tobe correct only in certain scenarios. The following propositionidentifies one of such scenarios.

Proposition 3. SupposeE = ∅ (i.e. no prerequisites for allcourses) andΓ(t) = N , ∀t (i.e. each course is offered in allquarters), if ǫn(|A|) = ǫn is a constant, then at anyt andgiven states(t − 1), the optimal policy recommends that thestudent should take the maximum number ofC courses.

Proof. Consider any policy that selectsK < C courses inquartert and a policy that selects theseK courses plus onemore coursen. For any possible next course states′, byselecting one more course, we have

p(s′|s,K) = p(s′|s,K + 1) + p(s′|s,K + 1) (9)

where s′ ≺ s′ and s′n = 1. Due to the course failureprobability being independent,p(s′|s, C) = ǫnp(s

′|s,K)and p(s′|s, C) = (1 − ǫn)p(s

′|s,K). SinceV (s′) ≥ V (s)according to Proposition 2, we haveQ(s,K) ≤ Q(s,K + 1).Therefore, takingC courses yields higher value than takingfewer courses.

Proposition 3 states that it is always better to take manycourses as early as possible when there is no constraint oncourse prerequisite and course availability. In practice,coursesequence recommendations are subject to many constraints.We provide a counter-example below to show that the resultof proposition 3 does not hold in the general case.

Counter-Example: Consider a program consisting of twocourses and two quarters. ThusN = 1, 2 and T = 2.Assume that there is no prerequisite course dependency, i.e.E = ∅. The course availability isΓ(1) = 1, 2 andΓ(2) = 2. That is, course 1 is offered only in the firstquarter.C = 2 so students are allowed to take up to 2 courses.Let ǫ(K) denote the probability of failing a course whenKcourses are taken simultaneously.

• Option 1: Take 2 courses in quarter 1. The probability ofgraduation on time can be computed as(1 − ǫ(2))(1 −ǫ(2) + ǫ(2)(1− ǫ(1))) = (1− ǫ(2))(1 − ǫ(2)ǫ(1))

• Option 2: Take course 1 only in quarter 1. The probabilityof graduation on time can be computed as(1− ǫ(1))(1−ǫ(1)).

Thus if (1− ǫ(2))(1− ǫ(2)ǫ(1)) < (1− ǫ(1))(1− ǫ(1)), thentaking only 1 course in the first quarter leads to a higherprobability of graduation. For instance, takingǫ(1) = 0.1and ǫ(2) = 0.2 satisfies this condition. This counter-exampledemonstrates the need for carefully planning the course se-quence according to the course prerequisite and course avail-ability because myopic course selection may lead to lowerlearning reward if these constraints are ignored.

Remark: The complexity of the proposed forward-searchbackward-induction algorithm to determine the optimal pol-icy depends on, the number of courses, the specific course

7

prerequisite and availability constraints. As shown in Propo-sition 1, the set of possible course states grow at differentspeeds depending on these constraints. The time and memorycomplexity generally depend on the size ofL(t) andH(t),

namelyO(T∑

t=0(|L(t)|+ |H(t)|)), which can be large in certain

scenarios. However, since our algorithm in this section isan offline algorithm and only needs to be executed once,complexity is not a big concern.

C. Implications on Curriculum Planning

Teaching resources are limited. An important question forcurriculum planning is how to allocate the limited availableteaching resources to the courses to minimize students’ time-to-graduation. Our framework can be helpful in answering partof this question. The proposition below shows, in a simplifiedscenario, that courses with many dependent courses shouldreceive more teaching resource in order to minimize the time-to-graduation.

Proposition 4. ConsiderN + 1 courses and the prerequisiteDAG satisfiesP (n) = 1, ∀n > 1 (i.e. course 1 is the pre-requisite course of all other courses). Consider the followingtwo cases of course availability constraints

• Case 1: Course 1 is offered in each quarter with prob-ability p < 1 and all other courses is offered in eachquarter with probability 1.

• Case 2: Course 2 is offered in each quarter with prob-ability p < 1 and all other courses is offered in eachquarter with probability 1.

AssumeC = 1 and ifp < 1√N+1

, then the expected graduationtime in case 1 is larger than in case 2.

Proof. Since course 1 is the prerequisite of all the othercourses, it has to be passed before any other course can betaken.

Case 1: The expected time to finish course 1 is1(1−ǫ)p whereǫ is the probability of failing a course. The expected time tofinish the remainingN courses isN

1−ǫ . Thus, the total expectedtime to graduate is

τ1 =1

(1− ǫ)p+

N

1− ǫ(10)

Case 2: The expected time to finish course 1 is1(1−ǫ) .Computing the expected time to finish the remainingNcourses is more complicated since one of the courses hasdifferent course availability than others. We consider an upperbound on this time. In particular, this upper bound is

τ2 <1

1− ǫ+max

N − 1

(1− ǫ)(1− p),

1

(1 − ǫ)p , τ2 (11)

Depending on the values ofp andN , τ can be written as

τ2 =

11−ǫ +

1(1−ǫ)p , if p ≤ 1

N1

1−ǫ +N−1

(1−ǫ)(1−p) , if p > 1N

(12)

If p ≤ 1N , it is easy to see thatτ2 < τ1 and henceτ2 < τ1. If

p > 1N ,

τ1 − τ2 =1

1− ǫ

(

1

p+N − 1−

N − 1

1− p

)

=1

1− ǫ

(1− p)2 − p2N

p(1− p)(13)

Therefore, ifp < 1√N+1

, thenτ1 > τ2 > τ2.

It is intuitive to understand that courses that are prerequisitesfor many other courses are more “important”. Subsequentcourses can be taken only if the prerequisite course is passed.Therefore, much time will be wasted if students cannot takethe prerequisite courses. On the other hand, even if a studentneeds to but cannot take a course that is not a prerequisite forother courses, he/she can still take other courses while waitingfor this course to become available.

D. Joint Optimization of Time-to-Graduation and GPA Per-formance

So far, we focused on constructing course sequence rec-ommendation policies that minimize the time-to-graduation.However, the exact learning performance upon graduation(e.g. GPA) is neglected. Nevertheless, the above dynamicprogramming based framework can be easily extended to thecase of joint optimization of time-to-graduation and GPA per-formance, provided that a sufficiently large dataset is availableto estimate the various model parameters. We elaborate on thispoint below.

The grade that a student can receive in a course often de-pends on the grades that the student received in the prerequisitecourses and perhaps how long ago the prerequisite courseswere taken. Thus, the course sequence that the student is takingmay have a significant impact on the GPA that he/she canobtain. To account for this effect, we modify the problemformulation: instead of keeping a course completion state,which records the courses that have been taken and passed,we keep a course performance state, which records the gradesthat the student has received in the passed courses and whenthese courses were taken. Then for each performance state, thepolicy tries to find the set of next courses to take in order tomaximize an objective function that jointly considers the timeof graduation and the obtained final GPA. However, solvingthis problem requires addressing several key challenges. First,the course performance state space is significantly larger thanthe course completion state space and grows exponentiallywith the number of possible grades. In particular, suppose thenumber of grade levels isK, then the number of all possiblestates isKN . Second, a huge dataset of student records isneeded to estimate the conditional probabilities of gradesofeach course depending on all possible course performancestates. Therefore, solving the optimal course recommendationpolicy is extremely difficult.

To derive efficient solutions without a large initial dataset,we construct course sequence recommendation policies thatjointly consider the GPA and time to graduation in two steps.Firstly, we determine a set of course sequence recommendation

8

policies that satisfy desired time to graduation constraintsusing our method developed in this section. Since there isa lack of dataset to estimate the course failure probabilities,the course sequence recommendation policy can even beconstructed by ignoring the course failure probabilities (i.e.treating ǫn → 0, ∀n). In the second step, we maximize thestudent GPA by considering only the policies derived in thefirst step. In particular, we aim to select for each student apersonalized policy that most suits this student and resultsin the highest GPA. We formalize the personalized policyselection problem in the next section.

Remark: Readers may wonder why there are multiplesolutions of course sequence recommendation policies fromthe first step so that personalization is possible in the secondstep. A couple of reasons can lead to multiple solutions. First,multiple solutions may occur due to ties, which are more likelyto happen when the randomness disappears asǫn → 0. Forinstance, consider 4 courses1, 2, 3, 4 and course 1 is theprerequisite of the other three courses. AssumeC = 2 andall courses are offered in all quarters, then course sequences1 → 2, 3 → 4, 1 → 2, 4 → 3 and 1 → 3, 4 → 2 alllead to the same shortest time-to-graduation. Second, insteadof keeping only the course sequence recommendation policythat yields the shortest time-to-graduation, the first stepofour approach can also generate a set of course sequencerecommendation policies that result in on-time graduation.

IV. ONLINE RECOMMENDATION POLICY

PERSONALIZATION

A. Problem Formulation

We consider an online setting where students enter the pro-gram in sequence. The students are indexed by1, 2, ..., i, ....Students come with different background (e.g. schools fromwhich the students graduated, SAT scores). We use a contextvector θi ∈ Θ to denote the student background whereΘ = [0, 1]W is the normalized context space with dimensionW . We have a set ofZ course sequence recommendationpolicies constructed, denoted byZ, using our method proposedin Section III. These recommendation policies ensure thatstudents will graduate early with high probability. However,the impact of these recommendation policies on the students’GPA performance is unknown a priori and may be differentfor students with different backgrounds.

For each studenti, the system selects one of theZ policiesto recommended course sequence to this student. When thestudent completes the program by following the recommendedcourse sequence, the GPA that he/she obtains is revealedas ri. Let µz(θ) = Er|θ be the expected GPA for thestudent with backgroundθ if a recommendation policyz isadopted. Ifµz(θ) were known for each policyz, then thepolicy selection problem would have been simple - selectingz∗(θ) = argmaxz µz(θ) maximizes the expected GPA for thisstudent. However, since the effectiveness of the recommenda-tion policies is unknown a priori, the best policies must belearned for each student.

Let σ be an online learning algorithm for policy selectionandσ(i) ∈ Z denote the policy that is used on studenti. We

use learning regret as the performance metric for a learningalgorithm. The learning regret up to studentI is defined asthe aggregate GPA difference between our learning algorithmand the oracle solution that selects the best policyz∗(θi), ∀i,i.e.

Reg(I) := E[

I∑

i=1

µz∗(θi)(θi)−I

∑

i=1

ri(σ(i))] (14)

where the expectation is taken with respect to the randomnessin grade realization and the selected policies. The regretcharacterizes the loss incurred due to the unknown systemdynamics and gives convergence rate of the total expected GPAof the learning algorithm to the value of the oracle solution.The regret is non-decreasing in the total number of incomingstudents, but we want it to increase as slow as possible. Anyalgorithm whose regret is sublinear inI, i.e. Reg(I) = O(Iα)such thatα < 1, will converge to the optimal solution interms of the average reward, i.e.lim

I→∞Reg(I)

I = 0. The regret

of learning also gives a measure for the rate of learning. Asmallerα will result in a faster convergence to the optimalaverage reward and thus, learning the optimal course sequencerecommendation is faster ifα is smaller.

B. Context-Aware Adaptive Policy Selection

A natural way to learn a course sequence recommendationpolicy’s effectiveness is to record and update the sample meanGPA obtained as students arrive and complete the programby adopting this policy. Using such a sample mean-basedapproach for policy selection is the basic idea of our learningalgorithm. However, major challenges still remain. Withoutusing the context information, we have only learned theaverage performance of each recommendation policy and thus,a single policy will always be selected. On the other hand,personalizing the policy for each student according to his/herbackground can be very difficult since the students can havediverse background and hence the context spaceΘ can behuge. The sample mean reward approach can fail to work sincethere will be very limited number of students who have thesame background. Our method to overcome this difficulty is byexploiting the similarity of students based on the assumptionthat students with similar background will achieve similarexpectedGPA by following the same course sequence recom-mendation policy. Our learning algorithm starts with a largercontext space to learn the best recommendation policy for thisspace and then gradually refines the learning by partitioningthe context space into smaller spaces.

Before we describe the details of our algorithm, we intro-duce several useful concepts.

• Student Cluster. A student cluster is represented bythe range of context information that is associated withstudents in the cluster. In this paper, we consider studentclusters that are created by uniformly partitioning thecontext space on each dimension, which are enoughto guarantee sublinear learning regrets. Thus, eachstudent cluster is aW -dimensional hypercube withside length being2−l for some l. This hypercuberepresents a level-l student cluster. At any moment

9

Fig. 5. Illustration of 2-D Student Clusters.

in time when a recommendation policy is applied toa student i, the algorithm keeps a set of mutuallyexclusive student clusters that cover the entire studentpopulation. We call these student clusters the activestudent clusters, and denote this set byΩ. Sincethe active student clusters evolve (i.e. become morerefined) as more students are enrolled and graduate, theactive setΩi uses a superscripti, which is the studentindex, to represent its dynamic nature. For instance,in the one-dimensional case,[0, 1/2), [1/2, 1]is a feasible set of active student clusters and[0, 1/4), [1/4, 1/2), [1/2, 3/4), [3/4, 7/8), [7/8, 1]is another feasible set of active student clusters. Figure5 illustrates a 2-dimensional student clustering.

• Counters. For each active student clusterC, the algorithmmaintainsZ counters: for each recommendation policyz ∈ Z, MC(z) records the number of students so far inwhich z is applied to.

• GPA Estimates. For each active student cluster, thealgorithm also maintains the sample mean GPA estimatesrC(z) for each policyz ∈ Z using the realized GPA ofstudents that belong toC so far.

The algorithm (see Algorithm 3) works as follows. For eachstudenti, the algorithm works in two steps.

• Policy Selection Step. When an student arrives, thealgorithm first checks which active student clusterC ∈ Ωi

it belongs to. Then it investigates counterMC(z) for allz ∈ Z to see if there exists any under-explored policysuch thatMC(z) ≤ γ(i, l) whereγ(i, l) is a deterministiccontrol function depending on the index of the studentiand the level of the student clusterl. If there exists suchan under-explored policyz, then the algorithm uses thispolicy to recommend course sequence for this studenti.If there does not exist any under-explored policy, thenthe algorithm selects the policy with the highest GPAestimate for the clusterC, i.e. argmaxz rC(z).

• Variable Update Step. After the student completes theprogram and the GPA is realized, the GPA estimate ofthe selected policy is updated. Moreover, if the number ofstudents in the student clusterC satisfies

∑

zMC(z) ≥ ζ(l)

whereζ(l) is a deterministic control function dependingon the level of student cluster, the current student clusterC is partitioned into2W level-(l + 1) smaller studentclusters. For the next student on,C is deactivated and

the new level-(l+ 1) student clusters are activated.

Remark: In the policy selection step, the algorithm mayselect an under-explored policy for a student. The purpose ofthis exploration is to learn the effectiveness of every policywith high confidence. However, this exploration does raisefairness issues for some students. There are a couple of solu-tions that can be used to address this “unfair” policy selectionissue. First, the policy selection is merely a recommendation,students can still freely choose whichever course sequencethey want to follow. Second, rewards mechanisms can bedesigned to incentivize students to follow the recommendedpolicy. For example, students who follow an under-exploredpolicy can enjoy a lower tuition or receive some form ofcompensation through some special fellowship.

Algorithm 3 Policy Selection and Adaptive Clustering

1: Initialize Ω = Θ, rΘ(z) = 0,MΘ(π) = 0, ∀z ∈ Z.2: for each studenti do3: Determine active clusterC ∈ Ωi such thatθi ∈ C4: Case 1: ∃z ∈ Z such thatMC(z) ≤ γ(i, l)5: Randomly select among such policiesσi = z6: Case 2: ∀z ∈ Z, MC(z) > γ(i, l)7: Selectσt = argmin

z∈ZrC(z).

8: SetMC(σi)←MC(σ

i) + 19: (The student GPAri is realized.)

10: UpdaterC(σt)11: if

∑

z MC(z) ≥ ζ(l) then12: Uniformly partitionC into 2W level-(l+1) student

clusters.13: Update the set of active clustersΩi.14: Update the counters and GPA estimates for all new

student clusters15: endif16: endfor

C. Control Function Determination

In this subsection, we determine the control functionγ(i, l)andζ(l) and evaluate the performance of the policy selectionalgorithm. The following assumption on student similarityisneeded for the regret analysis but not needed in the algorithm.

Assumption 1. (Student Similarity). For each policyz ∈ Z,there existsα > 0 such that for allθ, θ′ ∈ Θ, we have|µθ(z)−µθ′(z)| ≤ ‖θ, θ′‖α.

The above assumption states that if the student background(i.e. context) is similar, then the expected GPA by using thesame course sequence recommendation policy is also similar.

Proposition 5. By settingγ(i, l) = 22αl ln i and ζ(l) = A2pl,the learning regret for students up toI is

Reg(I) ≤lmax(I)∑

l=1

[Sl(I)Z22αl ln I

+Yl(I)(Z

∞∑

i=1

i−2 +A(2Wα/2 + 2)2(p−α)l)] (15)

10

whereYl(I) is the number of level-l student clusters that areactivated at studentI, Sl(I) is the number of level-l activestudent clusters at studentI andlmax(I) is the maximum levelof student clusters at studentI.

Proof. We break down the learning regret into three partsReg(I) = Reg1(I) + Reg2(I) + Reg3(I) that are respec-tive regrets due to exploration, selection of suboptimal poli-cies in exploitation and selection of near-optimal policiesin exploitation. For each of these three parts we can pro-vide a bound. In particular, Reg1(I) can be bounded bylmax(I)∑

l=1

[Sl(I)Z22αl ln I] since the number of exploration steps

increases sublinearly inI. Reg2(I) can be bounded by

Yl(I)Z∞∑

i=1

i−2 since the probability of choosing a suboptimal

policy in exploitation steps decreases sufficiently rapidly usinga Chernoff-Hoeffding bound. Reg3(I) can be bounded byYl(I)A(2W

α/2+2)2(p−α)l since the marginal cost of selectinga near-optimal policy decreases sufficiently rapidly.

Corollary 1. If the student context arrivals by studentI isuniformly distributed, we have

Reg(I) < I2α+W

3α+W [Z(ln I +

∞∑

i=1

i−2) + (Wα/2 + 2)22α+W ]

(16)

As we can see, the regret bound is sublinear in the numberof studentsI and hence, ifI is sufficiently large, then theaverage regret will be close to 0, which means that the optimalaverage GPA is achieved.

V. EXPERIMENTS

In this section, we apply the forward-search backward-induction method presented in Section III, and the regretminimization learning algorithm developed in Section IV toour dataset.

A. Dataset

Our experiments are based on a dataset from the undergrad-uate curriculum of the Mechanical and Aerospace Engineering(MAE) department at UCLA. The dataset contains the coursesequences and the course grades of 1444 anonymized studentswho graduated between the academic years 2013 and 2015.The course availability varies across years; typically, a courseis offered either once or twice every academic year but somecourses are offered once every two years. UCLA adopts thequarter system and in each academic year and courses aremostly offered in Fall, Winter, Spring quarters but not Summerquarters. Therefore, we will consider that one academic yearconsists of three quarters (hence, four academic years equal12 quarters).

The dataset also includes context information of the stu-dents, including their SAT scores and their high school GPAs.We observe that many students in the dataset take coursesoutside of the curriculum (such as the art courses). Our modelcan be extended to capture this possibility with added modelcomplexity but our experiment in this paper does not consider

Fig. 6. Graduation Time Distribution and the Impact of Math SAT score.

the recommendation of such courses. Some of the studentsin the dataset are transfer students, and they do not need totake the same number of courses as the regular students, sincethey may have fulfilled several requirements before coming toUCLA. Since the course information before the transferringis not included in this dataset, we exclude such transferstudents from our analysis. Figure 6 shows the graduationtime distribution of the students in the dataset. As we cansee, even though the majority of students graduate on timewithin the desired four years (12 quarters), many studentsdo not graduate on time and stay in the college for one oreven two more years. There is also a noticeable difference inthe graduation time distributions for students with differentcontext information: students with higher math SAT scoreshave a higher probability to graduate on time. This suggeststhat personalization based on the students’ context informationhas indeed the potential to provide better learning experienceand lead to better learning outcomes.

B. Impact of constraints

In this subsection we illustrate how course prerequisite andavailability reduce the number of possible course sequencerecommendations for students in MAE department at UCLA.Figure 2 in Sec. III depicts the prerequisite DAG of the 19courses used for analysis. Most of these courses are math,physics, and core MAE major courses. Generally speaking,math courses are prerequisites of physics courses which arefurther prerequisites of MAE major courses. We consider twocases of course availability constraints. In the first case,mostcourses are offered twice every academic year and in thesecond case, most courses are offered once every academicyear. This allows us to investigate the impact of courseavailability on course sequence recommendation. We focuson how to recommend course sequences to complete these19 courses as soon as possible.

First, we investigate the possible course completion statesthat can emerge. Although there are totally219 = 524288possible states, the course prerequisites and availability sig-nificantly limit the number of possible states that can emerge.Figure 7 illustrates the number of possible states (i.e. thesizeof the state setL(t) defined in Section III-A). Interestingly, thecourse prerequisite constraint limitslim

t→∞L(t) as the quarters

go by. In our experiment, there are totally 4880 possiblecompletion states depending on whether the student has taken,passed or failed the course, which are significantly fewerthan the possible states without any prerequisite constraints.

11

1 2 3 4 5 6 7 8 9 100

0.001

0.002

0.003

0.004

0.005

0.006

0.007

0.008

0.009

0.01

# of

em

erge

d st

ates

/ #

of a

ll po

ssib

le s

tate

s

Quarter

Case 1 Course AvailabilityCase 2 Course Availability

Fig. 7. Number of Course Completion States.

Moreover, the maximal number of possible states are reachedwithin a finite number of quarters. On the other hand, thecourse availability constraint affects how fast the setL(t)expands and reaches its maximal size. When courses areoffered less frequently,L(t) expands more slowly but willeventually reach the maximal size.

C. Course sequences

In this subsection, we apply the forward-search backward-induction algorithm presented in Sec. III to compute candi-date course sequence recommendation policies to students tocomplete the 19 courses presented in Sec. V-C. Although thecourse difficulty varies across courses, for illustrative purposes,we set the course failure ratio to be the sameǫ = 0.1 forall courses. Indeed, common practice tells us that teachersoften curve the students grades so that the passing ratios ofcourses do not vary significantly across courses. Table I andII show the best sequences that can be obtained for the twocourse availability cases when the student does not fail anycourse. It is worth noting that while some courses are takenafter some others in the first case, the order can be reversed inthe second case due to different course availability constraints.For instance, course CS 31 is taken in the first quarter beforecourses MATH 31B and MSE 104 in the first cases but it istaken in the third quarter after courses MATH 31B and MSE104 in the second case. The generated course sequence is alsouseful for the students to decide when is good time to takeextracurricular courses (e.g. the art courses). For instance, astudent should focus on the curriculum courses in quarter 1,3 and 5 since three courses need to be taken in each of thesequarters while quarters 4 and 8 are good time for the studentto take extracurricular courses that fall in the student’s interest.

Since students may fail the course, the best course sequencecan no longer be followed by a student. Once this happens,it becomes important to determine the subsequent coursesto recommend in any possible course completion state. Thecourse sequence recommendation policy generated by ouralgorithm can provide us with these answers for any course

TABLE IBEST RECOMMENDEDCOURSESEQUENCE(CASE 1 COURSE

AVAILABILITY )

Quarter Recommended Courses1 MATH 31A, CS 31, MAE 94, CHE 202 MATH 31B, MSE 1043 PHY 1A, MATH 32A, MATH 33B4 PHY 1B, PHY 4AL, MATH 32B5 PHY 1C, PHY 4BL, MATH 33A, MAE 1016 MAE 105A, MAE 102, MAE 103

TABLE IIBEST RECOMMENDEDCOURSESEQUENCE(CASE 2 COURSE

AVAILABILITY )

Quarter Recommended Courses1 MATH 31A, MAE 94, CHE 202 MATH 31B, MSE 1043 CS 31, MATH 32A, PHY 1B4 MATH 32B5 PHY 1A, MATH 33A, MAE 105A6 PHY 1B, PHY 4AL7 PHY 1C, PHY 4BL8 MAE 1019 MAE 102, MAE 103

failure ratio. Table III shows the results by running ouralgorithm aimed at maximizing the on-time graduation prob-ability or minimizing the graduation time for the two courseavailability cases. When courses are offered more frequently,the expected graduation time is significantly reduced and theon-time graduation probability is significantly improved.Theproposed framework can also be used to evaluate differentallocations of teaching resources by calculating their expectedon-time graduation probability and expected graduation time.

D. Personalized policy selection

As mentioned in Sec. IV, depending on the context informa-tion of the students, different course sequence recommendationpolicies may result in different learning experience. The firststep of our framework constructs course sequence recommen-dation policies to minimize time-to-graduation. The secondstep then personalizes the recommendation according to thestudents’ context to achieve a high GPA.

Due to the limited number of student records that wehave, we focus on the subsequence recommendation for asubset of 3 MAE major courses (i.e. MAE 101, MAE 103,MAE 105A). From our dataset, we observe that there aresix sequences that student use to take these 3 courses. Wewill use these six typical sequences to validate the proposedpersonalized policy selection algorithm. Note that we are notusing the policies generated in the first step to make coursesequence recommendations because the results on the students

TABLE IIICOMPLETION T IME RESULTS

Probability ofCompleting 19 Courses

in 10 Quarters

ExpectedCompletion

Time

BestSequence

TimeCase 1 39.70% 9.3 quarters 9 quartersCase 2 98.10% 6.4 quarters 6 quarters

12

TABLE IVGPA STATISTICS FOR THE SIX COURSE SEQUENCES

Sequence 1 2 3 4 5 6Number of students 89 75 72 51 31 18

SAT≤700mean 3.36 3.02 3.08 3.05 3.31 3.09count 28 20 28 13 21 4

700<SAT≤760mean 3.28 3.37 3.29 3.17 3.26 3.39count 38 22 22 16 5 5

760<SAT≤780 mean 3.61 3.13 3.22 3.25 NA 3.50count 14 19 14 10 0 8

SAT>780 mean 3.39 3.45 3.16 3.33 3.04 3.9count 9 14 8 12 5 1

in the dataset cannot be known if they are not aligned withthe actual sequences that the students take. Thus, instead ofevaluating the joint efficacy of the policy construction andpolicy selection, we only evaluate the efficacy of the policyselection algorithm in this subsection.

Table IV shows the statistics regarding these 6 subse-quences. As we can see, depending on the context informa-tion (i.e. math SAT score) of the students, different coursesequences yield different GPA performance. The average GPAof these students is 3.26. The evaluation of the proposedpersonalized policy selection will use the statistics given inTable IV. We compare the proposed algorithm with severalbenchmarks:

• Oracle: the oracle algorithm knows the GPA statisticsa priori and hence always recommends the best coursesequence to each student.

• Learning without Personalization: this algorithm doesnot know the GPA statistics. However, when recommend-ing course sequences, it ignores the context informationof the students.

• Random: this algorithm simply recommends course se-quence to students randomly.

Figure 8 shows the average GPA that the students can obtainas the proposed algorithm recommends course sequences tomore students. Initially, the achievable GPA is low sincethe algorithm does not have sufficient training samples. Asmore training samples are provided, the performance of thealgorithm improves, causing an increase in the simulated GPAfor the selected course sequence. Moreover, the algorithmadaptively clusters the students according to their contextinformation and significantly outperforms sequence recom-mendation that ignores the personalized context information.It is noteworthy that the Random scheme achieves a similarGPA as the actual average GPA in the dataset (i.e. 3.26). Thissuggests that the current practice of course sequence selectiondoes not recognize the difference in individual students andhence, there is much room to improve the students’ learningoutcomes by personalizing the course sequences recommendedto students.

VI. CONCLUSION

In this paper, we studied the problem of personalized coursesequence recommendation. The problem is solved in two steps.In the first step, we determine candidate course sequence rec-ommendation policies that result in short time-to-graduation

0 500 1000 1500 20003

3.1

3.2

3.3

3.4

3.5

3.6

Number of Students

Ave

rage

GP

A

Personalized (Proposed)OracleRandomLearning without Personalization

Fig. 8. GPA performance v.s. the number of students

using a Forward-Search Backward-Induction algorithm. Inthe second step, we develop an online regret minimizationlearning algorithm to select personalized course sequencerecommendation policies among the candidate policies forstudents aimed at maximizing students’ GPA performance.Our analysis and simulation results show that the proposedpersonalized course sequence recommendation method is ableto shorten the students’ graduation time and improve students’GPAs. Our framework also has important implications onhow the curriculum planner should design the curriculum andallocate teaching resources.

ACKNOWLEDGMENT

The authors are indebted to Mr. John S. Toledo, ResearchAssociate, Evaluation and Educational Assessment, UCLAOffice of Instructional Development for providing us the databased on which our methods were evaluated.

REFERENCES

[1] “Four-year myth,” 2014. [Online]. Available:http://completecollege.org/wp-content/uploads/2014/11/4-Year-Myth.pdf

[2] A. W. Astin and L. Oseguera, “Degree attainment rates at americancolleges and universities: Revised edition,”Higher Education ResearchInstitution, University of California Los Angeles, 2005.

[3] R. Baraniuk, “Open education: New opportunities for signal processing,”in Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEEInternational Conference on.

[4] L. Cen, D. Ruta, and J. Ng, “Big education: Opportunitiesfor big dataanalytics,” inDigital Signal Processing (DSP), 2015 IEEE InternationalConference on, July 2015, pp. 502–506.

[5] Y. Meier, J. Xu, O. Atan, and M. van der Schaar, “Predicting grades,”Signal Processing, IEEE Transactions on, vol. PP, no. 99, pp. 1–1, 2015.

[6] “Kdd cup 2015: Predicting dropouts in mooc.” [Online]. Available:https://kddcup2015.com

[7] C. Tekin, J. Braun, and M. van der Schaar, “etutor: Onlinelearningfor personalized education,” inAcoustics, Speech and Signal Processing(ICASSP), 2015 IEEE International Conference on, April 2015, pp.5545–5549.

[8] A. S. Lan, A. E. Waters, C. Studer, and R. G. Baraniuk, “Sparse factoranalysis for learning and content analytics,”The Journal of MachineLearning Research, vol. 15, no. 1, pp. 1959–2008, 2014.

[9] A. Asif, “Multimedia and cooperative learning in signalprocessing tech-niques in communications,”Signal Processing Letters, IEEE, vol. 11,no. 2, pp. 278–281, 2004.

[10] A. Klasnja-Milicevic, B. Vesin, M. Ivanovic, and Z. Budimac, “E-learning personalization based on hybrid recommendation strategy andlearning style identification,”Computers & Education, vol. 56, no. 3,pp. 885–899, 2011.

http://completecollege.org/wp-content/uploads/2014/11/4-Year-Myth.pdf

https://kddcup2015.com

13

[11] C.-M. Chen, H.-M. Lee, and Y.-H. Chen, “Personalized e-learningsystem using item response theory,”Computers & Education, vol. 44,no. 3, pp. 237–255, 2005.

[12] D. Wen-Shung Tai, H.-J. Wu, and P.-H. Li, “Effective e-learningrecommendation system based on self-organizing maps and associationmining,” The Electronic Library, vol. 26, no. 3, pp. 329–344, 2008.

[13] R. Farzan and P. Brusilovsky, “Social navigation support in a courserecommendation system,” inAdaptive hypermedia and adaptive web-based systems. Springer, 2006, pp. 91–100.

[14] N. Bendakir and E. Aimeur, “Using association rules forcourse recom-mendation,” inProceedings of the AAAI Workshop on Educational DataMining, vol. 3, 2006.

[15] K. I. Ghauth and N. A. Abdullah, “Learning materials recommendationusing good learners ratings and content-based filtering,”EducationalTechnology Research and Development, vol. 58, no. 6, pp. 711–727,2010.

[16] S. Shishehchi, S. Y. Banihashem, N. A. M. Zin, and S. A. M.Noah,“Review of personalized recommendation techniques for learners in e-learning systems,” inSemantic Technology and Information Retrieval(STAIR), 2011 International Conference on. IEEE, 2011, pp. 277–281.

[17] P. Resnick and H. R. Varian, “Recommender systems,”Communicationsof the ACM, vol. 40, no. 3, pp. 56–58, 1997.

[18] M. Balabanovic and Y. Shoham, “Fab: content-based, collaborativerecommendation,”Communications of the ACM, vol. 40, no. 3, pp. 66–72, 1997.

[19] F. Ricci, L. Rokach, and B. Shapira,Introduction to recommendersystems handbook. Springer, 2011.

[20] A. G. Parameswaran, H. Garcia-Molina, and J. D. Ullman,“Evaluating,combining and generalizing recommendations with prerequisites,” inProceedings of the 19th ACM international conference on Informationand knowledge management. ACM, 2010, pp. 919–928.

[21] A. Parameswaran, P. Venetis, and H. Garcia-Molina, “Recommendationsystems with complex constraints: A course recommendationperspec-tive,” ACM Transactions on Information Systems (TOIS), vol. 29, no. 4,p. 20, 2011.

[22] G. Shani, R. I. Brafman, and D. Heckerman, “An mdp-basedrec-ommender system,” inProceedings of the Eighteenth conference onUncertainty in artificial intelligence. Morgan Kaufmann PublishersInc., 2002, pp. 453–460.

[23] A. Slivkins, “Contextual bandits with similarity information,” The Jour-nal of Machine Learning Research, vol. 15, no. 1, pp. 2533–2568, 2014.

[24] M. Dudik, D. Hsu, S. Kale, N. Karampatziakis, J. Langford, L. Reyzin,and T. Zhang, “Efficient optimal learning for contextual bandits,” arXivpreprint arXiv:1106.2369, 2011.

[25] J. Langford and T. Zhang, “The epoch-greedy algorithm for multi-armed bandits with side information,” inAdvances in neural informationprocessing systems, 2008, pp. 817–824.

[26] W. Chu, L. Li, L. Reyzin, and R. E. Schapire, “Contextualbanditswith linear payoff functions,” inInternational Conference on ArtificialIntelligence and Statistics, 2011, pp. 208–214.

Personalized Course Sequence RecommendationsFig. 1. System Diagram for Course Sequence...

Documents

Transcript of Personalized Course Sequence RecommendationsFig. 1. System Diagram for Course Sequence...