Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 March...

37
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 March 26, 2012

Transcript of Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 March...

Page 1: Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 March 26, 2012.

Advanced Methods and Analysis for the Learning and Social Sciences

PSY505Spring term, 2012March 26, 2012

Page 2: Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 March 26, 2012.

Today’s Class

• Sequential Pattern Mining

Page 3: Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 March 26, 2012.

Related to

• Association Rule Mining• MOTIF Extraction

Page 4: Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 March 26, 2012.

Similarities

• MOTIF Extraction can be seen as a type of sequential pattern mining– Though MOTIFs can also be non-sequential, like in

the Shananbrook et al paper

• Some SPM algorithms find simpler patterns than MOTIF, other algorithms find more complex patterns than MOTIF

Page 5: Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 March 26, 2012.

Similarities

• Some algorithms for Sequential Pattern Mining similar to Association Rule Mining

Page 6: Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 March 26, 2012.

Association Rule Mining

• Try to automatically find if-then rules within the data set

Page 7: Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 March 26, 2012.

Sequential Pattern Mining

• Try to automatically find temporal patterns within the data set

Page 8: Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 March 26, 2012.

ARM Example

• If person X buys diapers,• Person X buys beer

• Purchases occur at the same time

Page 9: Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 March 26, 2012.

SPM Example

• If person X buys novel Foundation now,• Person X buys novel Second Foundation in a

later transaction

• Conclusion: recommend Second Foundation to people who have previously purchased Foundation

Page 10: Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 March 26, 2012.

SPM Example

• Many customers rent Star Wars, then the Empire Strikes Back, then Return of the Jedi

• Doesn’t matter if they rent other stuff in-between

Page 11: Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 March 26, 2012.

SPM Example

• Many customers buy flowers, and then buy diapers AND diaper cream several months later

Page 12: Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 March 26, 2012.

SPM Example

• Many learners become confused, then game the system, then become frustrated, then complete gaming the system, then become re-engaged

Page 13: Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 March 26, 2012.

Different Constraints than ARM

• If-then elements do not need to occur in the same data point

• Instead– If-then elements should have same user (or other

organizing variable)– If elements can be within a certain time window

of each other– Then element time should be within a certain

window after if times

Page 14: Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 March 26, 2012.

Sequential Pattern Mining

• Find all subsequences in data with high support

• Support calculated as number of sequences that contain subsequence, divided by total number of sequences

Page 15: Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 March 26, 2012.

Sequential Pattern Mining

• What are some subsequences with high support? (What is their support?)

• Chuck: a, abc, ac, de, cef • Darlene: af, ab, acd, dabc, ef• Egoberto: aef, ab, aceh, d, ae• Francine: a, bc, acf, d, abeg

Page 16: Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 March 26, 2012.

Questions? Comments?

Page 17: Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 March 26, 2012.

Algorithms for SPM

Page 18: Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 March 26, 2012.

GSP (Generalized Sequential Pattern)

• Classic Algorithm• (Srikant & Agrawal, 1996)

Page 19: Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 March 26, 2012.

Data pre-processing

• Data transformed from individual actions to sequences by user

• E.g.• Bob: {GAMING and BORED, OFF-TASK and

BORED, ON-TASK and BORED, GAMING and BORED, GAMING and FRUSTRATED, ON-TASK and BORED}

Page 20: Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 March 26, 2012.

Data pre-processing

• In some cases, time also included

• E.g.• Bob: {GAMING and BORED 5:05:20, OFF-TASK

and BORED 5:05:40, ON-TASK and BORED 5:06:00, GAMING and BORED 5:06:20, GAMING and FRUSTRATED 5:06:40, ON-TASK and BORED 5:07:00}

Page 21: Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 March 26, 2012.

Algorithm

• Take the whole set of sequences of length 1– May include “ANDed” combinations at same time

• Find which sequences of length 1 have support over pre-chosen threshold

• Compose potential sequences out of pairs of sequences of length 1 with acceptable support

• Find which sequences of length 2 have support over pre-chosen threshold

• Compose potential sequences out of triplets of sequences of length 1 and 2 with acceptable support

• Continue until no new sequences found

Page 22: Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 March 26, 2012.

Let’s execute GPS algorithm

• With min support = 50%

Page 23: Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 March 26, 2012.

Let’s execute GPS algorithm

• With min support = 50%

• Chuck: a, abc, ac, de, cef • Darlene: af, ab, acd, dabc, ef• Egoberto: aef, ab, aceh, d, ae• Francine: a, bc, acf, d, abeg

Page 24: Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 March 26, 2012.

Other algorithms

• Free-Span• Prefix-Span

• Select sub-sets of data to search within

• Faster, but same basic idea as in GPS

Page 25: Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 March 26, 2012.

Uses in educational domains

Page 26: Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 March 26, 2012.

Perera et al. (2009)

• What were the three ways that Perera et al. (2009) used sequential pattern mining?

• What did they learn, and how did they use the information?

Page 27: Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 March 26, 2012.

Perera et al. (2009)

1. Overall uses of collaborative tools by groups2. Sequences of collaborative tool use by different

group members3. Sequences of access of specific resources by different

group members

• In all cases, they found common patterns and then looked at how support differed for successful and unsuccessful groups

Page 28: Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 March 26, 2012.

Perera et al. (2009):Important Findings

1. Overall uses of collaborative tools by groups– Successful groups used ticketing system more

than the wiki; weaker groups used wiki more– Patterns were particularly strong for group

leaders

Page 29: Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 March 26, 2012.

Perera et al. (2009):Important Findings

2. Sequences of collaborative tool use by different group members– Successful groups characterized by leader

opening ticket and other student working on ticket

– Successful groups characterized by students other than leader opening ticket, and other students working on ticket

Page 30: Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 March 26, 2012.

Perera et al. (2009):Important Findings

3. Sequences of access of specific resources by different group members– The best groups had interactions around the

same resource by multiple students – The poor groups did no work on tickets before

closing them

Page 31: Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 March 26, 2012.

Zhang et al. (2005)Romero et al. (2008)

• Analyze students’ paths through learning resources in order to find and suggest resources for students

Page 32: Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 March 26, 2012.

Robinet et al. (2007)

• Mine sequences of student actions in a system where students are allowed to skip steps

• In order to infer intermediate/implicit steps during algebraic manipulation

• In other words, if some students have A->B->C• Infer that A->C has B in the middle

• Aids with choosing remedial feedback

Page 33: Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 March 26, 2012.

What else?

• What else could sequential pattern mining be used for in education?

Page 34: Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 March 26, 2012.

Asgn. 8

• Solutions

• Let’s look at solutions from– Sweet– Mike W.

Page 35: Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 March 26, 2012.

Asgn. 9

• Questions?• Comments?

Page 36: Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 March 26, 2012.

Next Class

• Wednesday, March 28• 3pm-5pm• AK232

• Learning Curves

• Readings• Martin, B., Mitrovic, A., Koedinger, K.R., Mathan, S. (2011)

Evaluating and improving adaptive educational systems with learning curves. User Modeling and User-Adapted Interaction, 21 (3), 249-283.

• Assignments Due: None

Page 37: Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 March 26, 2012.

The End