Developing standards and systems for MOOC data...
Transcript of Developing standards and systems for MOOC data...
![Page 1: Developing standards and systems for MOOC data sciencepeople.csail.mit.edu/zp/moocshop2013/presentation_14.pdf · Developing standards and systems for MOOC data science Kalyan Veeramachaneni,](https://reader033.fdocuments.net/reader033/viewer/2022051908/5ffbc6e8ad39031baf64c195/html5/thumbnails/1.jpg)
Developing standards and systems for MOOC data
science
Kalyan Veeramachaneni, Franck DernoncourtColin Taylor, Zach Pardos, Una-May O’Reilly
Any Scale learning for All CSAIL, MIT
Tuesday, July 9, 13
![Page 2: Developing standards and systems for MOOC data sciencepeople.csail.mit.edu/zp/moocshop2013/presentation_14.pdf · Developing standards and systems for MOOC data science Kalyan Veeramachaneni,](https://reader033.fdocuments.net/reader033/viewer/2022051908/5ffbc6e8ad39031baf64c195/html5/thumbnails/2.jpg)
Overview
• Background and motivation
• What did the data looked like?
• Where are the bottlenecks?
• Data science @ scale
• Organize- MOOCdb
• Create multiple views- MOOC En Images
• Provide APIs- MOOCdb Access
• Why standardize?
Tuesday, July 9, 13
![Page 3: Developing standards and systems for MOOC data sciencepeople.csail.mit.edu/zp/moocshop2013/presentation_14.pdf · Developing standards and systems for MOOC data science Kalyan Veeramachaneni,](https://reader033.fdocuments.net/reader033/viewer/2022051908/5ffbc6e8ad39031baf64c195/html5/thumbnails/3.jpg)
What did the data look like?
MOOC
Tuesday, July 9, 13
![Page 4: Developing standards and systems for MOOC data sciencepeople.csail.mit.edu/zp/moocshop2013/presentation_14.pdf · Developing standards and systems for MOOC data science Kalyan Veeramachaneni,](https://reader033.fdocuments.net/reader033/viewer/2022051908/5ffbc6e8ad39031baf64c195/html5/thumbnails/4.jpg)
What did the data look like?
MOOC
Server logsJSON lines
Browser logsJSON lines
Tuesday, July 9, 13
![Page 5: Developing standards and systems for MOOC data sciencepeople.csail.mit.edu/zp/moocshop2013/presentation_14.pdf · Developing standards and systems for MOOC data science Kalyan Veeramachaneni,](https://reader033.fdocuments.net/reader033/viewer/2022051908/5ffbc6e8ad39031baf64c195/html5/thumbnails/5.jpg)
What did the data look like?
MOOC
Server logsJSON lines
Browser logsJSON lines
SQL DumpProduction server
Forums data
Tuesday, July 9, 13
![Page 6: Developing standards and systems for MOOC data sciencepeople.csail.mit.edu/zp/moocshop2013/presentation_14.pdf · Developing standards and systems for MOOC data science Kalyan Veeramachaneni,](https://reader033.fdocuments.net/reader033/viewer/2022051908/5ffbc6e8ad39031baf64c195/html5/thumbnails/6.jpg)
What did the data look like?
MOOC
Server logsJSON lines
Browser logsJSON lines
Course information xml files
Course website
SQL DumpProduction server
Forums data
Tuesday, July 9, 13
![Page 7: Developing standards and systems for MOOC data sciencepeople.csail.mit.edu/zp/moocshop2013/presentation_14.pdf · Developing standards and systems for MOOC data science Kalyan Veeramachaneni,](https://reader033.fdocuments.net/reader033/viewer/2022051908/5ffbc6e8ad39031baf64c195/html5/thumbnails/7.jpg)
What did the data look like?
MOOC
Server logsJSON lines
Browser logsJSON lines
Course information xml files
Course website
Emails Surveys
SQL DumpProduction server
Forums data
Tuesday, July 9, 13
![Page 8: Developing standards and systems for MOOC data sciencepeople.csail.mit.edu/zp/moocshop2013/presentation_14.pdf · Developing standards and systems for MOOC data science Kalyan Veeramachaneni,](https://reader033.fdocuments.net/reader033/viewer/2022051908/5ffbc6e8ad39031baf64c195/html5/thumbnails/8.jpg)
Example Research Question RQ1: How does amount of time spent on the video correlate to performance on the homework?
What is the time period between two consecutive homeworks?
Identify what are the video urls that correspond to these homeworks?
Tuesday, July 9, 13
![Page 9: Developing standards and systems for MOOC data sciencepeople.csail.mit.edu/zp/moocshop2013/presentation_14.pdf · Developing standards and systems for MOOC data science Kalyan Veeramachaneni,](https://reader033.fdocuments.net/reader033/viewer/2022051908/5ffbc6e8ad39031baf64c195/html5/thumbnails/9.jpg)
Example Research Question RQ1: How does amount of time spent on the video correlate to performance on the homework?
Identify what are the video urls that correspond to these homeworks?
What is the time period between two consecutive homeworks?
Calculate the time spent on these urls by the user?
Tuesday, July 9, 13
![Page 10: Developing standards and systems for MOOC data sciencepeople.csail.mit.edu/zp/moocshop2013/presentation_14.pdf · Developing standards and systems for MOOC data science Kalyan Veeramachaneni,](https://reader033.fdocuments.net/reader033/viewer/2022051908/5ffbc6e8ad39031baf64c195/html5/thumbnails/10.jpg)
Example Research Question RQ1: How does amount of time spent on the video correlate to performance on the homework?
Identify what are the video urls that correspond to these homeworks?
What is the time period between two consecutive homeworks?
Calculate the time spent on these urls by the user?
What answers did the user submit during this period?
What are the correct answers for those problems?
Tuesday, July 9, 13
![Page 11: Developing standards and systems for MOOC data sciencepeople.csail.mit.edu/zp/moocshop2013/presentation_14.pdf · Developing standards and systems for MOOC data science Kalyan Veeramachaneni,](https://reader033.fdocuments.net/reader033/viewer/2022051908/5ffbc6e8ad39031baf64c195/html5/thumbnails/11.jpg)
Example Research Question RQ1: How does amount of time spent on the video correlate to performance on the homework?
Identify what are the video urls that correspond to these homeworks?
What is the time period between two consecutive homeworks?
Calculate the time spent on these urls by the user?
What answers did the user submit during this period?
What are the correct answers for those problems?
How many did the student get right?
Tuesday, July 9, 13
![Page 12: Developing standards and systems for MOOC data sciencepeople.csail.mit.edu/zp/moocshop2013/presentation_14.pdf · Developing standards and systems for MOOC data science Kalyan Veeramachaneni,](https://reader033.fdocuments.net/reader033/viewer/2022051908/5ffbc6e8ad39031baf64c195/html5/thumbnails/12.jpg)
Example Research Question RQ1: How does amount of time spent on the video correlate to performance on the homework?
Identify what are the video urls that correspond to these homeworks?
What is the time period between two consecutive homeworks?
Calculate the time spent on these urls by the user?
What answers did the user submit during this period?
What are the correct answers for those problems?
How many did the student get right?
Run corr function in MATLAB
Tuesday, July 9, 13
![Page 13: Developing standards and systems for MOOC data sciencepeople.csail.mit.edu/zp/moocshop2013/presentation_14.pdf · Developing standards and systems for MOOC data science Kalyan Veeramachaneni,](https://reader033.fdocuments.net/reader033/viewer/2022051908/5ffbc6e8ad39031baf64c195/html5/thumbnails/13.jpg)
Where are the bottlenecks?
Preprocess Assemble covariates
Build models Analyze
A typical process
Tuesday, July 9, 13
![Page 14: Developing standards and systems for MOOC data sciencepeople.csail.mit.edu/zp/moocshop2013/presentation_14.pdf · Developing standards and systems for MOOC data science Kalyan Veeramachaneni,](https://reader033.fdocuments.net/reader033/viewer/2022051908/5ffbc6e8ad39031baf64c195/html5/thumbnails/14.jpg)
Where are the bottlenecks?
Preprocess Assemble covariates
Build models Analyze
Preprocess Assemble covariates
Build models Analyze
sizes represent the time spent in doing the activity
What we desire ?
What is now ?
Tuesday, July 9, 13
![Page 15: Developing standards and systems for MOOC data sciencepeople.csail.mit.edu/zp/moocshop2013/presentation_14.pdf · Developing standards and systems for MOOC data science Kalyan Veeramachaneni,](https://reader033.fdocuments.net/reader033/viewer/2022051908/5ffbc6e8ad39031baf64c195/html5/thumbnails/15.jpg)
How about we organize the data using a concise schema that we will always adhere to ?
Then all the scripts we write can be reused
Tuesday, July 9, 13
![Page 16: Developing standards and systems for MOOC data sciencepeople.csail.mit.edu/zp/moocshop2013/presentation_14.pdf · Developing standards and systems for MOOC data science Kalyan Veeramachaneni,](https://reader033.fdocuments.net/reader033/viewer/2022051908/5ffbc6e8ad39031baf64c195/html5/thumbnails/16.jpg)
How about we organize the data using a concise schema that we will always adhere to ?
Then all the scripts we write can be reused
Challenge: Generalizable, loss-less
Tuesday, July 9, 13
![Page 17: Developing standards and systems for MOOC data sciencepeople.csail.mit.edu/zp/moocshop2013/presentation_14.pdf · Developing standards and systems for MOOC data science Kalyan Veeramachaneni,](https://reader033.fdocuments.net/reader033/viewer/2022051908/5ffbc6e8ad39031baf64c195/html5/thumbnails/17.jpg)
Steps to Data science @ scale
Step 1: Develop a generalizable, loss-less schema for the data
Step 2: Define and design APIs to extract multiple views of data
Step 4: Design a platform where users can contribute and share the scripts
Step 3: Design user friendly APIs
Tuesday, July 9, 13
![Page 18: Developing standards and systems for MOOC data sciencepeople.csail.mit.edu/zp/moocshop2013/presentation_14.pdf · Developing standards and systems for MOOC data science Kalyan Veeramachaneni,](https://reader033.fdocuments.net/reader033/viewer/2022051908/5ffbc6e8ad39031baf64c195/html5/thumbnails/18.jpg)
Step 1: Develop a generalizable, loss-less schema
• Students interact with the system in the following ways
• Observe
• Submit
• Collaborate
• Give feedback
Tuesday, July 9, 13
![Page 19: Developing standards and systems for MOOC data sciencepeople.csail.mit.edu/zp/moocshop2013/presentation_14.pdf · Developing standards and systems for MOOC data science Kalyan Veeramachaneni,](https://reader033.fdocuments.net/reader033/viewer/2022051908/5ffbc6e8ad39031baf64c195/html5/thumbnails/19.jpg)
Step 1: Develop a generalizable, loss-less schema
The observing mode
Tuesday, July 9, 13
![Page 20: Developing standards and systems for MOOC data sciencepeople.csail.mit.edu/zp/moocshop2013/presentation_14.pdf · Developing standards and systems for MOOC data science Kalyan Veeramachaneni,](https://reader033.fdocuments.net/reader033/viewer/2022051908/5ffbc6e8ad39031baf64c195/html5/thumbnails/20.jpg)
Step 1: Develop a generalizable, loss-less schema
Is this generalizable?
The observing mode
Tuesday, July 9, 13
![Page 21: Developing standards and systems for MOOC data sciencepeople.csail.mit.edu/zp/moocshop2013/presentation_14.pdf · Developing standards and systems for MOOC data science Kalyan Veeramachaneni,](https://reader033.fdocuments.net/reader033/viewer/2022051908/5ffbc6e8ad39031baf64c195/html5/thumbnails/21.jpg)
Step 1: Develop a generalizable, loss-less schema
Is this generalizable? yes
The observing mode
Tuesday, July 9, 13
![Page 22: Developing standards and systems for MOOC data sciencepeople.csail.mit.edu/zp/moocshop2013/presentation_14.pdf · Developing standards and systems for MOOC data science Kalyan Veeramachaneni,](https://reader033.fdocuments.net/reader033/viewer/2022051908/5ffbc6e8ad39031baf64c195/html5/thumbnails/22.jpg)
Step 1: Develop a generalizable, loss-less schema
Is this generalizable? yes
Is this loss -less?
The observing mode
Tuesday, July 9, 13
![Page 23: Developing standards and systems for MOOC data sciencepeople.csail.mit.edu/zp/moocshop2013/presentation_14.pdf · Developing standards and systems for MOOC data science Kalyan Veeramachaneni,](https://reader033.fdocuments.net/reader033/viewer/2022051908/5ffbc6e8ad39031baf64c195/html5/thumbnails/23.jpg)
Step 1: Develop a generalizable, loss-less schema
Is this generalizable? yes
Is this loss -less? yes
The observing mode
Tuesday, July 9, 13
![Page 24: Developing standards and systems for MOOC data sciencepeople.csail.mit.edu/zp/moocshop2013/presentation_14.pdf · Developing standards and systems for MOOC data science Kalyan Veeramachaneni,](https://reader033.fdocuments.net/reader033/viewer/2022051908/5ffbc6e8ad39031baf64c195/html5/thumbnails/24.jpg)
Step 1: Develop a generalizable, loss-less schema
MOOCdb
Tuesday, July 9, 13
![Page 25: Developing standards and systems for MOOC data sciencepeople.csail.mit.edu/zp/moocshop2013/presentation_14.pdf · Developing standards and systems for MOOC data science Kalyan Veeramachaneni,](https://reader033.fdocuments.net/reader033/viewer/2022051908/5ffbc6e8ad39031baf64c195/html5/thumbnails/25.jpg)
Step 2: Define and design APIs to extract multiple views of data
TimeSpace
Student cohorts
by grade by access patternsby certificate winnerscustom cohort
by week by deadlineby midterm by homework by modules custom time points
by country by IP by city
MOOC En Images
Tuesday, July 9, 13
![Page 26: Developing standards and systems for MOOC data sciencepeople.csail.mit.edu/zp/moocshop2013/presentation_14.pdf · Developing standards and systems for MOOC data science Kalyan Veeramachaneni,](https://reader033.fdocuments.net/reader033/viewer/2022051908/5ffbc6e8ad39031baf64c195/html5/thumbnails/26.jpg)
Step 2: Define and design APIs to extract multiple views of data
MOOC En Images
Tuesday, July 9, 13
![Page 27: Developing standards and systems for MOOC data sciencepeople.csail.mit.edu/zp/moocshop2013/presentation_14.pdf · Developing standards and systems for MOOC data science Kalyan Veeramachaneni,](https://reader033.fdocuments.net/reader033/viewer/2022051908/5ffbc6e8ad39031baf64c195/html5/thumbnails/27.jpg)
Step 2: Define and design APIs to extract multiple views of data
MOOC En Images 6/30/13 francky.me/mit/moocdb/all/submissions_per_day.html
francky.me/mit/moocdb/all/submissions_per_day.html 1/1
Number of submissions per day:
Number of submissions per day
Number ofsubmissions
1 10 19 28 37 46 55 64 73 82 91 100 109 118 127 136 145 1540
200,000
400,000
600,000
800,000
Day
Tuesday, July 9, 13
![Page 28: Developing standards and systems for MOOC data sciencepeople.csail.mit.edu/zp/moocshop2013/presentation_14.pdf · Developing standards and systems for MOOC data science Kalyan Veeramachaneni,](https://reader033.fdocuments.net/reader033/viewer/2022051908/5ffbc6e8ad39031baf64c195/html5/thumbnails/28.jpg)
Step 3: Design user friendly APIs
MOOCdb - Access
MATLAB Python
Tuesday, July 9, 13
![Page 29: Developing standards and systems for MOOC data sciencepeople.csail.mit.edu/zp/moocshop2013/presentation_14.pdf · Developing standards and systems for MOOC data science Kalyan Veeramachaneni,](https://reader033.fdocuments.net/reader033/viewer/2022051908/5ffbc6e8ad39031baf64c195/html5/thumbnails/29.jpg)
Step 4: Design a platform where community can share scripts
MOOC En Images
Multiple views of data
IRT
BKT
Mobility
Who talked to who?
User friendly APIs
MATLABPythonR
Query Optimizers
Scri
pts A
rchi
ve
MOOCdb
Scientific Community
Analysts
Database experts
Privacy experts
Tuesday, July 9, 13
![Page 30: Developing standards and systems for MOOC data sciencepeople.csail.mit.edu/zp/moocshop2013/presentation_14.pdf · Developing standards and systems for MOOC data science Kalyan Veeramachaneni,](https://reader033.fdocuments.net/reader033/viewer/2022051908/5ffbc6e8ad39031baf64c195/html5/thumbnails/30.jpg)
Leading to standardization
Shared Standardized
schema
MOOC Curators Database
Insert logs into database
Scientific Community
Feedback to MOOC Organizers, sharing of results, predictions descriptive analytics.
Analysts Database experts
Public access Platform
Crowd
Scripts archive
Privacy experts
Database Standardized
schema
Transform Database into
standard format
Sql scripts
Sql scripts ="+"
MOOCdb
Tuesday, July 9, 13
![Page 31: Developing standards and systems for MOOC data sciencepeople.csail.mit.edu/zp/moocshop2013/presentation_14.pdf · Developing standards and systems for MOOC data science Kalyan Veeramachaneni,](https://reader033.fdocuments.net/reader033/viewer/2022051908/5ffbc6e8ad39031baf64c195/html5/thumbnails/31.jpg)
Benefits of standardization
• Reduction in time to analyze (TTA)
• Publicly available scripts to create data views
• Unified representation to DB/Privacy experts
• Crowd sourcing of feature engineering
• Crowd sourcing of data analytics
• Elimination of entry barriers
Tuesday, July 9, 13
![Page 32: Developing standards and systems for MOOC data sciencepeople.csail.mit.edu/zp/moocshop2013/presentation_14.pdf · Developing standards and systems for MOOC data science Kalyan Veeramachaneni,](https://reader033.fdocuments.net/reader033/viewer/2022051908/5ffbc6e8ad39031baf64c195/html5/thumbnails/32.jpg)
Thank you!
Look for MOOC En Images, updates on MOOCdb and open release of all the tools.
Tuesday, July 9, 13