Big Data Meets Learning Science: Keynote by Al Essa
-
Upload
spark-summit -
Category
Data & Analytics
-
view
308 -
download
0
Transcript of Big Data Meets Learning Science: Keynote by Al Essa
Big Data Meets Learning Science
Apache Spark Summit East 2017
Alfred EssaVP, Research and Data ScienceMcGraw-Hill Education@malpaso
Our Journey from Print to Digital
2 McGraw-Hill Learning Science
3 Spark, DataBricks
1 Innovation Pipeline
Speed of innovation, notdata, is the differentiator.
Technology Time to Market
Spark Factor
People Process
Apache Spark
DataBricks
Innovation Pipeline
Research
ProductValidation
ProductDevelopment
Databricks underpins our innovation pipeline and workflow.
2 McGraw-Hill Learning Science
From Print to Digital: 128-year Journey
K-12, Higher Ed & Professionalbusinesses
~4,800employees
Parking Lot
Adaptive Platform Leverages MHE Reach and Scale
How do we learn and how can we learn better?.
Introduction of SmartBook
May 2013 1,500+ adaptive products available
Now
Learners who have used MHE Adaptive
~5,500,000 ~10,000,000,000Student interactions
Authors trained to use MHE Adaptive~4,000
Parking Lot
Research Phase
Research Step: Build Models and Algorithms
Model Iteration: Ship “Data Instrumented” product to improve and tune models
Product Focus: Iterate prototypes with business and customers
.Research Question: How do we learn and how can we learn better?
Stacked Algorithm
Learning Tool for Optimizing Acquisition and Recall
2 31
Learning Science Principles
Effortful Recall
Spaced Practice
Interleaving
CognitiveScienceModel
Mobile App
3 Spark, DataBricks
The Problem
Students drop out or fail their course
1
2 At-risk students can be difficult to identify by instructors
Identify at-risk studentspre-emptively
The Solution
A classifier to predict abandonment
Jacqueline FeildDataScientist
Nicholas LewkowDataScientist
Solution: A Classifier to Predict Abandonment
F1 F2 F3 F4 F5 F61F1 F2 F3 F4 F5 F61
F1 F2 F3 F4 F5 F60F1 F2 F3 F4 F5 F61
F1F2F3F4F5F6
Classificationalgorithm
0
• Logistic Regression usedforinitialclassification algorithm
• Simplealgorithm tointerpret• Providesprobability estimates
instead ofhardclassification label• Allowsforsimple interpretation of
featureimportance
• Oneclassifierworksforalldisciplines
Parking LotCompanies that we have met with and completed Stage One, but there is no immediate partnership opportunity
Parallel Pipeline for Creating Classifier
How do we learn and how can we learn better?.
Models and Algorithms
Ship “Data Instrumented” Product
Iterate Prototypes with Customers
Parking LotCompanies that we have met with and completed Stage One, but there is no immediate partnership opportunity
Spark Transformation
How do we learn and how can we learn better?.
Models and Algorithms
Ship “Data Instrumented” Product
Iterate Prototypes with Customers
Parking LotCompanies that we have met with and completed Stage One, but there is no immediate partnership opportunity
Speedup with Spark
How do we learn and how can we learn better?.
Models and Algorithms
Ship “Data Instrumented” Product
Iterate Prototypes with Customers
Sn:Speedupfromn cores
t1:Timetorunon1core
tn:Timetorunonncores
Parking LotCompanies that we have met with and completed Stage One, but there is no immediate partnership opportunity
Evaluate Model Accuracy
How do we learn and how can we learn better?.
Models and Algorithms
Iterate Prototypes with Customers
• Use area underthereceiver operatingcharacteristiccurve(AUC-ROC) asanothermeasure ofmodelaccuracy
• 0.9- 1.0=excellent
• 0.8- 0.9=good
• 0.7- 0.8=fair
• 0.6- 0.7=poor
• 0.5- 0.6=fail
• LookathowtheAUC-ROC foramodelchangesthroughoutthesemester
22
Evaluate Intervention Window
InterventionWindow:
Howmuch timeinadvance canweprovideforanintervention tooccur prior toabandonment?
Conclusions
Technologyisimportant,butbuildanagileinnovationworkflowwithDatabricks.