MyPlan - similarity metrics for matching lifelong learner timelines

36
02 December 2008 MyPlan - Similarity Metrics for Matching Lifelong Learner Timelines Nicolas Van Labeke

Transcript of MyPlan - similarity metrics for matching lifelong learner timelines

Page 1: MyPlan - similarity metrics for matching lifelong learner timelines

02 December 2008

MyPlan - Similarity Metrics for Matching

Lifelong Learner Timelines

Nicolas Van Labeke

Page 2: MyPlan - similarity metrics for matching lifelong learner timelines

Using Similarity Metrics for Matching Lifelong Learners 2

The Context

• Lifelong Learners?– Learning opportunities– All ages, all contexts

• Role of Technology?– Ubiquitous access to resources and facilities– Learner-centred models of organising and

delivering educational resources

• Better support for planning?

Page 3: MyPlan - similarity metrics for matching lifelong learner timelines

Using Similarity Metrics for Matching Lifelong Learners 3

The MyPlan project

• funded by the JISC e-Learning Capital programme, 1/9/2006 – 30/11/2008(RA 1/4/2007 – 30/7/2008)

• developing, deploying and evaluating new techniques and tools that allow personalised planning of lifelong learning

• building on and extending the earlier L4All project and software prototype, funded by the JISC Distributed e-Learning Pilots programme 1/2/2005 – 31/10/2006

Page 4: MyPlan - similarity metrics for matching lifelong learner timelines

Using Similarity Metrics for Matching Lifelong Learners 4

Partners (MyPlan)

• Birkbeck College – 80% of students are part-time

• Institute of Education• Community College Hackney

– A Level, GCSE, adult learning courses, teacher training and vocational qualifications

• UCAS– UK central organisation through which applications are

processed for entry to HE, providing information and services to prospective students and HE professionals.

• Linking London Lifelong Learning Network (L4N)– support lifelong learners in the London region, providing them

with access to information and resources that facilitates their progression from Secondary Education, through to Further Education (FE) and on into Higher Education

Page 5: MyPlan - similarity metrics for matching lifelong learner timelines

Using Similarity Metrics for Matching Lifelong Learners 5

L4ALL – Approach

• Taking a holistic view of lifelong learners’ work and learning experience

• Based on the notion of learning pathways

• Sharing learning pathways with others:– identifying learning opportunities that may not

otherwise have been considered– positioning successful learners “like me” as

role models

Page 6: MyPlan - similarity metrics for matching lifelong learner timelines

Using Similarity Metrics for Matching Lifelong Learners 6

L4ALL – Methodology

• User requirements elicitation, via interviews with HE and FE students, focus groups (educators, recruitment & careers specialists), workshop events, consultation with advisors: – use cases– examples of learning pathways– identification of critical decision points

• Technical requirements elicitation– development of tools and standards– use of existing e-services where possible

• User-centred design– Iterative & incremental prototyping– Usability

Page 7: MyPlan - similarity metrics for matching lifelong learner timelines

Using Similarity Metrics for Matching Lifelong Learners 8

L4ALL – Supporting Engagement & Participation

1. Lifelong learners require support not only at the level of the individual user but also at the level of a group or team, and of the learning community as a whole

2. There are critical decision points or periods where lifelong learners need increased support

3. A partnership between the different stakeholders (e.g. lifelong learners themselves but also learning providers, career advisors, adult learning organisations) is an important element in offering a holistic approach to personal development.

Page 8: MyPlan - similarity metrics for matching lifelong learner timelines

Using Similarity Metrics for Matching Lifelong Learners 9

L4ALL – Personalising the pathway through lifelong learning• Breaking the “one size fit

all” mould• Recognition of diversity

• Different interaction at different stage of the journey– Motivation– Curriculum – Logistic – Pedagogy– Assessment– Opportunity Why should I learn?

What can I learn?

How could I study?

How will I learn?

How do I know I've learned?

Personalised needs-benefits analysisAccess to advice, guidance, learners’ case studies

Curriculum choice through HE partnershipsCloser links to work and community

Adaptive, interactive learningCommunication, collaboration

Assessment when readyProgress files, e-portfolios

Access to information & guidanceQualifications - career options planner

Flexible modes, locations etc. Mix of home, campus, overseas

Where will it take me?

Page 9: MyPlan - similarity metrics for matching lifelong learner timelines

Using Similarity Metrics for Matching Lifelong Learners 10

L4ALL – Lifelong Learning for All The System

• Timeline: record of a user’s learning trail– Educational, professional and personal

• A web-based portal for lifelong learners– Access information about courses– Manage personal development plan– Annotate, Reflect & Share

• Pilot System – Incremental design– Simple Service-Oriented Architecture– Ontology-based Learner Model (RDF - JENA)

• Skeleton of a Social Network Platform?

Page 10: MyPlan - similarity metrics for matching lifelong learner timelines

Using Similarity Metrics for Matching Lifelong Learners 11

L4ALL – System Architecture

Web Server (Apache Tomcat)

L4All Portal (JSP)

Graphical User

Interface

UserManagement

Timeline Search

Course Search

Web Services(Servlets)

Java Beans

JENA

Semantic Web

Framework

Databases(MySQL)

User Course

LearnDirect API

Course Search

GoogleMapAPI

Location Search

Page 11: MyPlan - similarity metrics for matching lifelong learner timelines

Using Similarity Metrics for Matching Lifelong Learners 12

Page 12: MyPlan - similarity metrics for matching lifelong learner timelines

Using Similarity Metrics for Matching Lifelong Learners 13

MyPlan - Introducing Personalised Functionalities• To develop and evaluate user models that reflect the needs of the

diverse population of lifelong learners. – Lifelong learner ontology, interoperability (H. Baajour)

• To allow learners to role-play different learning and career progressions, by integrating game-based applications into the system– Second Life sessions (S. De Freitas)

• To enhance individual learners’ engagement with the lifelong learning process by developing, deploying and evaluating personalised functionalities for searching and recommendation of learning opportunities – Personalised search of timelines– Recommendations

Redesigning the GUI

Page 13: MyPlan - similarity metrics for matching lifelong learner timelines

Using Similarity Metrics for Matching Lifelong Learners 14

SIMILE Javascript Timeline – http://simile.mit.edu/timeline/

Page 14: MyPlan - similarity metrics for matching lifelong learner timelines

Using Similarity Metrics for Matching Lifelong Learners 15

Searching the L4ALL User Model

• A three-part model– User Profile: identification, personal information, …– Learning Profile: learning goals, skills, qualification,

…– Timeline, as set of episodes: description, title,

classification, start date, duration, …• Search by keywords

Personalised search for “people like me”– Reflect structure and semantic of timelines– Detect “similarities” between learners’ pathway

Page 15: MyPlan - similarity metrics for matching lifelong learner timelines

Using Similarity Metrics for Matching Lifelong Learners 16

Similarity Metrics

• Textual-based metrics with algorithm-specific indication of similarity between 2 strings– “SAM” / “SAMUEL”

• Levenshtein Distance (Edit Distance)– number of transpositions, substitutions and deletions

needed to transform one string into another

• Information integration & applied CS– bioinformatics, musicology, phonetic, etc– ITS: sequence of instructional activities (

Page 16: MyPlan - similarity metrics for matching lifelong learner timelines

Using Similarity Metrics for Matching Lifelong Learners 17

Our approach

• Black-box– Reusing existing metrics– Identifying behaviour in the context of timeline

• Different interpretations of “people like me”

• Focus on usability, not accuracy

Tokenisation of Timelines

Page 17: MyPlan - similarity metrics for matching lifelong learner timelines

Using Similarity Metrics for Matching Lifelong Learners 18

Hypothesis 1 & 2 : Time

• Timelines are (obviously) time-dependent– Essential for user’s own pathways– No evidence for relevance in “people like me”

• Similar episode two years apart?• Similar episode twice as long (part-time)?

Start dates and duration ignored Gap between episodes ignored

Relative position used to sort episodes

Page 18: MyPlan - similarity metrics for matching lifelong learner timelines

Using Similarity Metrics for Matching Lifelong Learners 19

Hypothesis 3 : Category of episode

• Different categories of episodes– Educational– Occupation– Personal

• Importance for own pathways – critical turning point

• Irrelevant for “people like me”?

Categories to be filtered out by user

Description

SC Attended school

CL Attended college

UN Attended University

DG Obtained a degree

CS Attended a particular course

WK Employed

VL Voluntary work in charity/voluntary organisation

BS Started a business

ML Attended military service

RE Retired

UE Unemployed

CR Home carer

MV Moved to a different location

TV Spent some time abroad

CH Birth in the family

AD Adopted a child

DE Death in the family

MA Got married

SE Divorced

DS Developed a (permanent) disability

IL Developed a (temporary) illness

OTAny user-defined episode not covered previously

Page 19: MyPlan - similarity metrics for matching lifelong learner timelines

Using Similarity Metrics for Matching Lifelong Learners 20

Hypothesis 4 : Classification of episodes

0.0.0.0 Unknown1.0.0.0 Managers and Senior Officials2.0.0.0 Professional Occupations

2.3.0.0 Teaching and Research Professionals2.3.2.0 Research Professionals

2.3.2.1 Scientific Researchers2.3.2.2 Social Science Researchers2.3.2.9 Researchers N.E.C.

- -2.3.2.1 6.4.0.0WKSecondary

classification(e.g. discipline, activity sector)

Primary classification

(e.g. qualification, occupation)

Episode Category (e.g. work, college, military service, …)

0.0.0.0 Unknow1.0.0.0 Medicine and Dentistry6.0.0.0 Mathematical and Computer Sciences

6.4.0.0 Computer Science

• Category of episode alone not sufficient

• Most important episodes have extra classifications

• But fine-grained description may not be useful

User to vary depth of classification

Page 20: MyPlan - similarity metrics for matching lifelong learner timelines

Using Similarity Metrics for Matching Lifelong Learners 21

Tokenisation of Timelines

Cl-10.1.0.0-3.1.0.0

Dg-10.1.0.0-3.1.0.0

Wk-4.0.0.0-7.2.1.2

Wk-11.0.0.0-3.1.3.2

Wk-3.0.0.0-4.1.3.6

Mv-0.0.0.0-0.0.0.0

Un-6.4.0.0-6.3.0.0

Cl-10-3 Dg-10-3 Wk-4-7 Wk-11-3 Wk-3-4 Mv-0-0 Un-6-6

Exp

ress

ivity

Cl-- Dg-- Wk-- Wk-- Wk-- Un--Mv--

Page 21: MyPlan - similarity metrics for matching lifelong learner timelines

Using Similarity Metrics for Matching Lifelong Learners 22

Similarity Metrics

SimMetrics JAVA package – http://www.dcs.shef.ac.uk/~sam/simmetrics.html

Levenshtein

Needleman – Wunsch

Jaro

Matching Coefficient

Euclidean Distance

Block Distance

Jaccard Similarity

Cosine Similarity

Dice Similarity

Overlap Coefficient

Page 22: MyPlan - similarity metrics for matching lifelong learner timelines

Using Similarity Metrics for Matching Lifelong Learners 23

Encoding of some timelines

ID Description Encoding

Source The original timeline used as the source for the similarity measure Cl-00 Un-00 Mv-00 Wk-00

Id A timeline similar to the source. Cl-00 Un-00 Mv-00 Wk-00

ReA timeline containing the same episodes as the source but in a totally different order (i.e. no episode is at the same position in the string). Un-00 Wk-00 Cl-00 Mv-00

ADeA new work episode (similar to an existing one) is added to the timeline.

Cl-00 Un-00 Mv-00 Wk-00 Wk-00

ADnA new episode (different from all existing ones) is added to the timeline.

Cl-00 Un-00 Mv-00 Wk-00 Bs-00

RMw The last episode is removed from the source timeline. Cl-00 Un-00 Mv-00

RMu One of the episodes of the source timeline is removed. Cl-00 Mv-00 Wk-00

SBnOne of the episodes of the source timeline is substituted by a new one (different from all existing ones).

Cl-00 Un-00 Mv-00 Bs-00

SBeOne of the episodes of the source timeline is substituted by an existing episode.

Cl-00 Un-00 Mv-00 Un-00

SBvOne of the episodes of the source timeline is substituted by a variant of an existing episode.

Cl-00 Un-00 Mv-00 Wk-10

Page 23: MyPlan - similarity metrics for matching lifelong learner timelines

Using Similarity Metrics for Matching Lifelong Learners 24

Comparison of Metrics

ID RE ADe ADn RMw RMu SBn SBe SBv

Levenshtein 1 0 0.8 0.8 0.75 0.75 0.75 0.75 0.75

Needleman - Wunsch 1 0 0.8 0.8 0.75 0.75 0.75 0.75 0.88

Jaro 1 0.72 0.93 0.93 0.92 0.92 0.83 0.83 0.83

Matching Coefficient 1 1 0.8 0.8 0.75 0.75 0.75 0.75 0.75

Euclidean Distance 1 1 0.84 0.84 0.8 0.8 0.75 0.75 0.75

Block Distance 1 1 0.89 0.89 0.86 0.86 0.75 0.75 0.75

Jaccard Similarity 1 1 1 0.8 0.75 0.75 0.6 0.75 0.6

Cosine Similarity 1 1 1 0.89 0.87 0.87 0.75 0.87 0.75

Dice Similarity 1 1 1 0.89 0.86 0.86 0.75 0.86 0.75

Overlap Coefficient 1 1 1 1 1 1 0.75 1 0.75

User-defined cost functionsUser-defined cost functions

Page 24: MyPlan - similarity metrics for matching lifelong learner timelines

Using Similarity Metrics for Matching Lifelong Learners 25

Search for “People like me”

• “Existential” search• Filtering by

– User profile– Episode categories

• Tuning by– Classification depth– Similarity Metrics

• Ranking by timeline similarity

Page 25: MyPlan - similarity metrics for matching lifelong learner timelines

Using Similarity Metrics for Matching Lifelong Learners 26

Page 26: MyPlan - similarity metrics for matching lifelong learner timelines

Using Similarity Metrics for Matching Lifelong Learners 27

Explaining Similarity Measures

• Needleman – Wunsch

• Computing alignment of strings– Copy/substituting tokens– Insertion/deletion

• Optimal score for alignment of the first i characters in T1 and the first j characters in T2

• Score indicates minimal edit distance

• Backtracking for alignment(s)

0

0

0

10

321234D

32123C

432112B

543211A

654321

CBECBA

1

___

CBE

D

_

C

C

B

B

A

A

G

G

d

Page 27: MyPlan - similarity metrics for matching lifelong learner timelines

Using Similarity Metrics for Matching Lifelong Learners 28

Cl-0.0-4.1

Wk-R.0-2.4

Un-6.4-6.3

Wk-N.0-9.2

Un-6.4-6.1

Wk-S.0-3.1

Wk-J.0-3.1

Wk-J.0-2.1

Cl-6.4-4.1

Wk-J.0-2.1

Un-9.1-6.3

Wk-R.0-2.4

Wk-C.0-3.5

Wk-P.0-2.3

Un-6.4-6.1

Wk-G.0-4.2

Wk-K.0-3.1S1

S2

Cl-0.0-4.1

Wk-R.0-2.4

Un-6.4-6.3

Wk-N.0-9.2

Un-6.4-6.1

Wk-S.0-3.1

Wk-J.0-3.1

Wk-J.0-2.1

0 1 2 3 4 5 6 7 8

Cl-6.4-4.1 1 2 3 4 5 6 7 8 9

Wk-J.0-2.1

2 3 4 5 6 7 8 9 8

Un-9.1-6.3 3 4 5 6 7 8 9 10 9

Wk-R.0-2.4

4 5 4 5 6 7 8 9 10

Wk-C.0-3.5

5 6 5 6 7 8 9 10 11

Wk-P.0-2.3

6 7 6 7 8 9 10 11 12

Un-6.4-6.1 7 8 7 8 9 8 9 10 11

Wk-G.0-4.2

8 9 8 9 10 9 10 11 12

Wk-K.0-3.1

9 10 9 10 11 10 11 12 13

Use

r’s

Tim

elin

e (S

1)Target’s Timeline (S2)

Cl-0.0-4.1

Wk-R.0-2.4

Un-6.4-6.3

Wk-N.0-9.2

Un-6.4-6.1

Wk-S.0-3.1

Wk-J.0-3.1

Wk-J.0-2.1

Cl-6.4-4.1

Wk-J.0-2.1

Un-9.1-6.3

Wk-R.0-2.4

Wk-C.0-3.5

Wk-P.0-2.3

Un-6.4-6.1

Wk-G.0-4.2

Wk-K.0-3.1S1

S2

Page 28: MyPlan - similarity metrics for matching lifelong learner timelines

Using Similarity Metrics for Matching Lifelong Learners 29

“What should I do next?”

• “Recommendation” too strong term– Suggesting reliability & objectivity; difficulty of obtaining expert

pathways• Role Model

– source of inspiration– This is what people have done after following a pathway similar

to yours; why not consider a similar future ? Exploiting String alignments

• Identifying common patterns & possible future pathways• Naïve “Rule of Thumb” approach• Lack of semantic BETWEEN episodes

Cl-0.0-4.1

Wk-R.0-2.4

Un-6.4-6.3

Wk-N.0-9.2

Un-6.4-6.1

Wk-S.0-3.1

Wk-J.0-3.1

Wk-J.0-2.1

Cl-6.4-4.1

Wk-J.0-2.1

Un-9.1-6.3

Wk-R.0-2.4

Wk-C.0-3.5

Wk-P.0-2.3

Un-6.4-6.1

Wk-G.0-4.2

Wk-K.0-3.1S1

S2

Page 29: MyPlan - similarity metrics for matching lifelong learner timelines

Using Similarity Metrics for Matching Lifelong Learners 30

Page 30: MyPlan - similarity metrics for matching lifelong learner timelines

Using Similarity Metrics for Matching Lifelong Learners 31

Page 31: MyPlan - similarity metrics for matching lifelong learner timelines

Using Similarity Metrics for Matching Lifelong Learners 32

Page 32: MyPlan - similarity metrics for matching lifelong learner timelines

Using Similarity Metrics for Matching Lifelong Learners 33

Conclusions

• Different metrics, different aspects of string comparison– Not one particularly adequate or “better”– Context of use important: what does “people like me” mean?

• What are they good for?– Separation between encoding and matching– Encoding does not depend on context, embeds some – not all –

of the timeline’s semantic • Persistent storage, indexing, RSS feed, alerts

• What are they no so good for?– Discrepancy between string similarity and timeline similarity– Lack of explanation on the reasons for similarity

• The way forward?– Identifying contexts of usage and deploying tailored mechanism– User-defined mechanism

Page 33: MyPlan - similarity metrics for matching lifelong learner timelines

Using Similarity Metrics for Matching Lifelong Learners 34

Which Measure of (Dis)similarity?

• Needleman – Wunsch– Distance between tokens?

– Cost functions• G: gap (insert/delete)• d: distance (substitute)

• Normalised Similarity?– algorithm-specific

___ DCBA

CBE _CBA

E _CBA

66% (4/6)

50% (2/4)

Similarity Dissimilarity

- -2.3.2.1 6.4.0.0WK

- -1.0.0.0 4.2.0.0WK

Page 34: MyPlan - similarity metrics for matching lifelong learner timelines

Using Similarity Metrics for Matching Lifelong Learners 35

An Holistic Approach of Timeline Matching

Page 35: MyPlan - similarity metrics for matching lifelong learner timelines

Using Similarity Metrics for Matching Lifelong Learners 36

10

9

10

Wk-K.0-3.1

Cl-0.0-4.1

Wk-G.0-3.1

Un-6.4-6.1

Wk-I.0-3.1

0 1 2 3 4 5

Cl-6.4-4.1 1 2 3 4 5 6

Wk-J.0-2.1 2 3 4 5 6 7

Un-9.1-6.3 3 4 5 6 7 8

Wk-R.0-2.4 4 5 6 7 8 9

Wk-C.0-3.5 5 6 7 8 9 10

Wk-P.0-2.3 6 7 8 9 11

Un-6.4-6.1 7 8 9 9 10

Wk-G.0-4.2 8 9 10 11 10 11

Wk-K.0-3.1 8 9 10 11 12

Us

er’

s T

ime

lin

e (

S1

)

Target’s Timeline (S2)

Wk-G.0-3.1

Un-6.4-6.1

Wk-I.0-3.1

Cl-0.0-4.1

Wk-K.0-3.1

Wk-G.0-3.1

Un-6.4-6.1

Wk-I.0-3.1

Cl-0.0-4.1

Wk-K.0-3.1

Cl-6.4-4.1

Wk-J.0-2.1

Un-9.1-6.3

Wk-R.0-2.4

Wk-C.0-3.5

Wk-P.0-2.3

Un-6.4-6.1

Wk-G.0-4.2

Wk-K.0-3.1

Cl-6.4-4.1

Wk-J.0-2.1

Un-9.1-6.3

Wk-R.0-2.4

Wk-C.0-3.5

Wk-P.0-2.3

Un-6.4-6.1

Wk-G.0-4.2

Wk-K.0-3.1

Alignment 1

Alignment 2

Multiple String Alignments

Page 36: MyPlan - similarity metrics for matching lifelong learner timelines

Using Similarity Metrics for Matching Lifelong Learners 37

Future Work (?)

• (Multiple) External Representations of timelines AND similarities

• Full-fledged Social Network functionalities– Reflection– Help & advice seeking, interventions (peers,

institutions, …)

• “Recommendation”– Dependencies BETWEEN episodes– Domain knowledge (e.g. course entry profile,

alternatives to top-down taxonomies)