Building A Predictive Model A Behind the Scenes Look
description
Transcript of Building A Predictive Model A Behind the Scenes Look
BUILDING A PREDICTIVE MODELA Behind the Scenes Look
Mike SharkeyDirector of Academic Analytics, The Apollo Group
January 9, 2012
THE 50,000 FT. VIEW
We have lots of data;we need to set a good foundation…
…so we can extract information that will help our students succeed
OUR DATA FOUNDATION
INTEGRATED DATA WAREHOUSE
LMS
SIS
CMS
Appl
icati
ons
IntegratedData
Repository
IntegratedData
RepositoryDatabases
ReportingTools
AnalyticsTools
BusinessIntelligence
Applicant
HOW IS IT WORKING?
Continuous flow of integrated data
Can drill down to the transaction level
New data flows require in-demand resources
Need skilled staff to understand the data model
Advantages Disadvantages
BUILDING A PREDICTIVE MODEL
PREDICTING SUCCESS… …BUT WHAT IS SUCCESS?
Learning
Program persistence
Course completion
??
Studentdrops out
Studentpasses class
Did the students learn what they were
supposed to learn?
THE PLAN
Use available data to build a model (logistic regression) Demographics, schedule, course history, assignments
Develop a model to predict course pass/fail e.g. scale of 1-10
10 will likely pass the course 1 will most likely fail the course
Feed the score to academic counselors who can intervene (phone at-risk students)
THE MODEL
Built different models Associates, Bachelors, Masters Predict at Week 0, Week 1, … to Week (last)
Strongest predictive coefficients Course assignment scores (stronger as course goes on) Financial status (mostly at Week 0) Did the student fail courses in the past Credits earned in the program (tenure)
WHERE WE ARE TODAY
Validation The statistics are sound, but we need to field test the
intervention plan to validate the model scores What we learned
The strongest parameters are the most obvious (assignments) Weak parameters: gender, age, weekly attendance
Add future parameters as available Class activity, participation, faculty alerts, inactive time
between courses, interaction with faculty, orientation participation, late assignments
5 CHALLENGES IN BUILDING & DEPLOYING LEARNING ANALYTICS SOLUTIONS
Christopher Brooks ([email protected])
MY BIASES
A domain of higher education Scalable and broad solutions The grey areas between research and
production
QUESTION: YOUR BIASES: WHAT DO YOU THINK THE PRINCIPAL GOAL OF LEARNING ANALYTICS SHOULD BE?
Enabling human intervention Computer assisted instruction (dynamic content
recommendation, tutoring, quizzing) Conducting educational research Administrative intelligence, transparency,
competitiveness Other (write in chat)
CHALLENGE 1: WHAT ARE YOU BUILDING Exploring data
Intuition and domain expertise are useful Multiple perspectives from people familiar with the data More data types (diversity) is better, smaller datasets
(instances) is ok Imprecision in data is ok Visualization techniques
Answering a question Data should be cleaned and rigorous, with error recognized
explicitly The quantity of data in the datasets (instances) strengthens the
result Decision makers must guide the process (are the questions
worth answering?) Statistical techniques
CASE 1: HOW HEALTHY IS YOUR CLASSROOM COMMUNITY (SNA)
CASE 2: APPLYING SUPERVISED LEARNING TECHNIQUES (CLUSTERING)
RESULTS VALIDATED, QUANTIFIED, AND ENCOURAGED MORE INVESTIGATION
Hypotheses H1: There will be a group of minimal activity learners... H2: There will be a group of high activity learners... H3: There will be a group of disillusioned learners... H4: There will be a group of deferred learners...
CHALLENGE 2: WHAT TO COLLECT
Too much versus too little Make a choice based on end goals Think in terms of events instead of the “click stream” Collecting “everything” comes with upfront
development costs and analysis costs The risk is the project never gets off the ground Make hypotheses explicit in your team so they can decide
how best to collect that data
Follow agile software development techniques (iterate & get constant feedback)
Build institutional will with small targeted gains
CHALLENGE 3: UNDERSTAND YOUR USER
Breadth of ContextAdministrator
Rates for degree completion, retention rate, re-enrolment rate, number of active students...
(Abbreviated statistics)
Instructional Design/ResearcherEducational researcher, what works and what doesn't
tools and processes should change...
(Sophisticated statistics & visualizations)
InstructorEvaluation of students, of a cohort of students, and
identifying immediate remediation...
(Visualization, Abbreviated statistics)
StudentEvaluation, evaluation, evaluation....
(Visualization)
WITH GREAT POWER COMES GREAT RESPONSIBILITY....
Some potential abuses of student tracking data Changing pedagogical technique to the detriment of some
students Denying help to those who “aren't really trying” A failure of instructors to acknowledge the challenges that
face students
Is it ethical to give instructors access to student analytics data?
Yes No Sometimes
(write your thoughts in the chat)
CHALLENGE 4: ACKNOWLEDGE CAVEATS
Analytics shows you a part of the picture only Dead tree learning, in-person social constructivism,
shoulder surfing/account sharing Anonymization tools, javascript/flash blockers False positives (incorrect amazon recommendations) Misleading actions (incorrect self-assessment, or
gaming the system (Baker)) Solutions
Aggregation & anonymization Make error values explicit Use broad categories for actionable analytics
DOES LEARNER MODELLING OFFER SOLUTIONS? Learner modelling community blends with analytics.
Open learner modelling (students can see their completed model)
Scruitable learner modelling (students can see how the system model of them is formed)
Question: I believe the student should have the right to view where analytics data about themselves has come from and who it has been made available to.
Yes No Sometimes
(and what are the implications on doing this? write in chat)
CHALLENGE 5: CROSS APPLICATION BOUNDARIES
Data from different applications (clickers, lcms, lecture capture, SIS/CIS, publisher quizzes, etc.) doesn't play well together
Requires cleaning Requires normalizing on semantics Requires access
Data warehousing activities Is there a light on the horizon?
http://www.flickr.com/photos/malikdhadha/5105818154/
QUICK CONCLUSIONS
Thus far I've learned it's important to: Know your goals Know your user Capture what you know you
need and don't worry about the rest
Acknowledge limitations of your approach
Iterate, iterate, iterate
Christopher BrooksDepartment of Computer Science
University of [email protected]
LEARNING ANALYTICS FOR C21 DISPOSITIONS & SKILLS
Simon Buckingham Shum
Knowledge Media Institute, Open U. UK
simon.buckinghamshum.net
@sbskmi
L.A. FRAMEWORK TO THINK WITH…
Discipline knowledge
Educator owns and manages a single dataset
L.A. FRAMEWORK TO THINK WITH…
Discipline knowledge
Educator owns and manages a single datasetEducator owns and manages multiple datasets
L.A. FRAMEWORK TO THINK WITH…
Discipline knowledge
Educator owns and manages a single dataset
Educator owns and manages multiple datasets
Learners add their own datasets
L.A. FRAMEWORK TO THINK WITH…
Discipline knowledge
Educator owns and manages a single dataset
Educator owns and manages multiple datasets
Learners add their own datasets
Hybrid closed + open datasets
L.A. FRAMEWORK TO THINK WITH…
Discipline knowledge
Educator owns and manages a single dataset
Educator owns and manages multiple datasets
Learners add their own datasets
Hybrid closed + open datasets
Hybrid closed + open analytics
L.A. FRAMEWORK TO THINK WITH…
Discipline knowledge
Educator owns and manages a single dataset
Educator owns and manages multiple datasets
Learners add their own datasets
Hybrid closed + open datasets
Hybrid closed + open analytics
Focus of most LA effort
beginning to move towards these more
complex spaces
L.A. FRAMEWORK TO THINK WITH…
Discipline knowledge
Educator owns and manages a single dataset
Educator owns and manages multiple datasets
Learners add their own datasets
Hybrid closed + open datasets
Hybrid closed + open analytics
Focus of most LA effort
beginning to move towards these more
complex spaces
http://solaresearch.org/OpenLearningAnalytics.pdf
L.A. FRAMEWORK TO THINK WITH…
Discipline knowledge C21 Learning Capacities
Educator owns and manages a single dataset
Educator owns and manages multiple datasets
Learners add their own datasets
Hybrid closed + open datasets
Hybrid closed + open analytics
critical for learner
engagement, and authentic
learning
critical for learner
engagement, and authentic
learning
Focus of most LA effort
beginning to move towards these more
complex spaces
“We are preparing students for jobs that do not exist yet, that will use technologies that have not been invented yet, in order to solve problems that are not even problems yet.”
“Shift Happens”http://shifthappens.wikispaces.com
LEARNING ANALYTICS FOR THIS?
LEARNING ANALYTICS FOR THIS?
“The test of successful education is not the amount of knowledge that pupils take away from school, but their appetite to know and their capacity to learn.”
Sir Richard Livingstone, 1941
ANALYTICS FOR… C21 SKILLS?
LEARNING HOW TO LEARN?AUTHENTIC ENQUIRY?
social capital critical questioning argumentation citizenship habits of mind resilience
collaboration creativity metacognitionidentity readiness sensemaking
engagement motivation emotional intelligence
38
L.A. FRAMEWORK TO THINK WITH…
Discipline knowledge C21 Learning Capacities
Educator owns and manages a single dataset
Educator owns and manages multiple datasets
Learners add their own datasets
Hybrid closed + open datasets
Hybrid closed + open analytics
More LA effort needed
e.g.1. Disposition
Analytics2. Discourse
Analytics
Focus of most LA effort
beginning to move towards these more
complex spaces
ANALYTICS FOR LEARNING DISPOSITIONS
ELLI: EFFECTIVE LIFELONG LEARNING INVENTORYWEB QUESTIONNAIRE 72 ITEMS (CHILDREN AND ADULT VERSIONS: USED IN SCHOOLS, UNIVERSITIES AND WORKPLACE)
Buckingham Shum, S. and Deakin Crick, R (2012). Learning Dispositions and Transferable Competencies: Pedagogy, Modelling, and Learning Analytics. Accepted to 2nd International Conference on Learning Analytics & Knowledge (Vancouver, 29 Apr – 2 May, 2012).
VALIDATED AS LOADING ONTO 7 DIMENSIONS OF “LEARNING POWER”
Changing & Learning
Meaning Making
Critical Curiosity
Creativity
Learning Relationships
Strategic Awareness
Resilience
Being Stuck & Static
Data Accumulation
Passivity
Being Rule Bound
Isolation & Dependence
Being Robotic
Fragility & Dependence
ELLI GENERATES A 7-DIMENSIONAL SPIDER DIAGRAM OF HOW THE LEARNER SEES THEMSELF
Bristol and Open University are now embedding ELLI in learning software.
Basis for a mentored-discussion on how the learner
sees him/herself, and strategies for strengthening
the profile
43
ADDING IMAGERY TO ELLI DIMENSIONS TO CONNECT WITH LEARNER IDENTITY
Milhouse
ELLI GENERATES COHORT DATA FOR EACH DIMENSION
…DRILLING DOWN ON A SPECIFIC DIMENSION
Plugin visualizes blog categories,
mirroring the ELLI spider
ENQUIRYBLOGGER:TUNING WORDPRESS AS AN ELLI-BASED LEARNING JOURNAL
Standard Wordpress editor
Categories from ELLI
ENQUIRYBLOGGER:COHORT DASHBOARD
LEARNINGEMERGENCE.NET more on analytics for learning to learn and authentic enquiry
ANALYTICS FOR LEARNING CONVERSATIONS
DISCOURSE LEARNING ANALYTICS
Effective learning conversations display some typical characteristics which learners can and
should be helped to master
Learners’ written, online conversations can be analysed computationally for patterns signifying
weaker and stronger forms of contribution
SOCIO-CULTURAL DISCOURSE ANALYSIS (MERCER ET AL, OU)
• Disputational talk, characterised by disagreement and individualised decision making.
• Cumulative talk, in which speakers build positively but uncritically on what the others have said.
• Exploratory talk, in which partners engage critically but constructively with each other's ideas.
Mercer, N. (2004). Sociocultural discourse analysis: analysing classroom talk as a social mode of thinking. Journal of Applied Linguistics, 1(2), 137-168.
• Exploratory talk, in which partners engage critically but constructively with each other's ideas.
• Statements and suggestions are offered for joint consideration.
• These may be challenged and counter-challenged, but challenges are justified and alternative hypotheses are offered.
• Partners all actively participate and opinions are sought and considered before decisions are jointly made.
• Compared with the other two types, in Exploratory talk knowledge is made more publicly accountable and reasoning is more visible in the talk.
Mercer, N. (2004). Sociocultural discourse analysis: analysing classroom talk as a social mode of thinking. Journal of Applied Linguistics, 1(2), 137-168.
SOCIO-CULTURAL DISCOURSE ANALYSIS (MERCER ET AL, OU)
ANALYTICS FOR IDENTIFYING EXPLORATORY TALK
Elluminate sessions can be very long – lasting for hours or even covering days of a conference
It would be useful if we could identify where quality learning conversations seem to be taking place, so we can recommend those sessions, and not have to sit through online chat about virtual biscuits
Ferguson, R. and Buckingham Shum, S. Learning analytics to identify exploratory dialogue within synchronous text chat.1st International Conference on Learning Analytics & Knowledge (Banff, Canada, 27 Mar-1 Apr, 2011)
De Liddo, A., Buckingham Shum, S., Quinto, I., Bachler, M. and Cannavacciuolo, L. Discourse-centric learning analytics. 1st International Conference on Learning Analytics & Knowledge (Banff, 27 Mar-1 Apr, 2011)
KMI’S COHERE: A WEB DELIBERATION PLATFORM ENABLING SEMANTIC SOCIAL NETWORK AND DISCOURSE NETWORK ANALYTICS
Rebecca is playing the role of broker,
connecting 2 peers’ contributions in
meaningful ways
DISCOURSE ANALYSIS
BACKGROUND KNOWLEDGE:
Recent studies indicate …
… the previously proposed …
… is universally accepted ...
NOVELTY:
... new insights provide direct evidence ...
... we suggest a new ... approach ...
... results define a novel role ...
OPEN QUESTION:
… little is known …
… role … has been elusive
Current data is insufficient …
GENERALIZING:
... emerging as a promising approach
Our understanding ... has grown exponentially ...
... growing recognition of the
importance ...
CONRASTING IDEAS:
… unorthodox view resolves … paradoxes …
In contrast with previous hypotheses ...
... inconsistent with past findings ...
SIGNIFICANCE:
studies ... have provided important advances
Knowledge ... is crucial for ... understanding
valuable information ... from studies
SURPRISE:
We have recently observed ... surprisingly
We have identified ... unusual
The recent discovery ... suggests intriguing roles
SUMMARIZING:
The goal of this study ...
Here, we show ...
Altogether, our results ... indicate
Xerox’s parser can detect the presence of ‘knowledge-level’ moves in text:
Ágnes Sándor & OLnet Project:http://olnet.org/node/512
De Liddo, A., Sándor, Á. and Buckingham Shum, S. (In Press). Contested Collective Intelligence: Rationale, Technologies, and a Human-Machine Annotation Study. Computer Supported Cooperative Work Journal
NEXT STEPS
SOCIAL LEARNING ANALYTICS: Develop this framework to integrate social, discourse, disposition and other process-centric analytics
DISPOSITION ANALYTICS: Extend the capabilities of the ELLI ‘learning power’ platform using real-time analytics data from online learner activity
DISCOURSE ANALYTICS: human+machine annotation of written discourse and argument maps
IN MORE DETAIL…Social Learning AnalyticsBuckingham Shum, S. and Ferguson, R. (2011). Social Learning Analytics. Available as: Technical Report KMI-11-01, Knowledge Media Institute, The Open University, UK. http://kmi.open.ac.uk/publications/techreport/kmi-11-01
Discourse AnalyticsDe Liddo, A., Buckingham Shum, S., Quinto, I., Bachler, M. and Cannavacciuolo, L. (2011). Discourse-Centric Learning Analytics. 1st International Conference on Learning Analytics & Knowledge (Banff, 27 Mar-1 Apr, 2011). Eprint: http://oro.open.ac.uk/25829 Ferguson, R. and Buckingham Shum, S. (2011). Learning Analytics to Identify Exploratory Dialogue Within Synchronous Text Chat. 1st International Conference on Learning Analytics & Knowledge (Banff, Canada, 27 Mar-1 Apr, 2011). Eprint: http://oro.open.ac.uk/28955De Liddo, A., Sandor, A. and Buckingham Shum, S. (2012, In Press). Contested Collective Intelligence: Rationale, Technologies, and a Human-Machine Annotation Study. Computer Supported Cooperative Work. DOI: 10.1007/s10606-011-9155-x. http://www.springerlink.com/content/23n1408l9g06v062
Disposition AnalyticsFerguson, R., Buckingham Shum, S. and Deakin Crick, R. (2011). EnquiryBlogger: Using Widgets to Support Awareness and Reflection in a PLE Setting. 1st Workshop on Awareness and Reflection in Personal Learning Environments, PLE Conference 2011, 11-13 July 2011, Southampton, UK. Eprint: http://oro.open.ac.uk/30598 Buckingham Shum, S. and Deakin Crick, R (2012). Learning Dispositions and Transferable Competencies: Pedagogy, Modelling, and Learning Analytics. Accepted to 2nd International Conference on Learning Analytics & Knowledge (Vancouver, 29 Apr – 2 May, 2012). Working draft under revision: http://projects.kmi.open.ac.uk/hyperdiscourse/docs/SBS-RDC-review.pdf
SUMMARY
Discipline knowledge C21 Learning Capacities
Educator owns and manages a single dataset
Educator owns and manages multiple datasets
Learners add their own datasets
Hybrid closed + open datasets
Hybrid closed + open analytics
More LA effort needed
We need analytics tuned to generic capacities
which equip learners for novel challenges
Focus of most LA effort
mastery of core knowledge and skills in training is vital, but no
longer sufficient