Carnegie Mellon Project LISTEN 17/22/2004 Some Useful Design Tactics for Mining ITS Data Jack Mostow...
-
date post
19-Dec-2015 -
Category
Documents
-
view
223 -
download
0
Transcript of Carnegie Mellon Project LISTEN 17/22/2004 Some Useful Design Tactics for Mining ITS Data Jack Mostow...
1 7/22/2004
CarnegieMellon
Project LISTEN
Some Useful Design Tactics for Mining ITS Data
Jack MostowProject LISTEN (www.cs.cmu.edu/~listen)
Carnegie Mellon University
Funding: National Science Foundation
ITS 04 Workshop on Analyzing Student-Tutor Interaction Logs to Improve Educational Outcomes, Maceio, Brazil
2 7/22/2004
CarnegieMellon
Project LISTEN
Outline
1. Project LISTEN’s Reading Tutor
2. Modify tutor to get mineable data
3. Map data stream to analyzable data set
4. Mine data set to discover insights
4 7/22/2004
CarnegieMellon
Project LISTEN
Project LISTEN’s Reading Tutor (video)
John Rubin (2002). The Sounds of Speech (Show 3). On Reading Rockets (Public Television series commissioned by U.S. Department of Education). Washington, DC: WETA.
Available at www.cs.cmu.edu/~listen.
5 7/22/2004
CarnegieMellon
Project LISTEN
Thanks to fellow LISTENers
Tutoring: Dr. Joseph Beck, mining tutorial data Prof. Albert Corbett, cognitive tutors Prof. Rollanda O’Connor, reading Prof. Kathy Ayres, stories for children Joe Valeri, activities and interventions Becky Kennedy, linguist
Listening: Dr. Mosur Ravishankar, recognizer Dr. Evandro Gouvea, acoustic training John Helman, transcriber
Programmers: Andrew Cuneo, application Karen Wong, Teacher Tool
Field staff: Dr. Roy Taylor Kristin Bagwell Julie Sleasman
Grad students: Hao Cen, HCI Cecily Heiner, MCALL Peter Kant, Education Shanna Tellerman, ETC
Plus: Advisory board Research partners
DePaul UBC U. Toronto
Schools
6 7/22/2004
CarnegieMellon
Project LISTEN
Project LISTEN’s Reading Tutor: A rich source of experimental data
2003-2004 database: 9 schools > 200 computers > 50,000 sessions > 1.5M tutor responses > 10M words recognized Embedded experiments
Randomized trials
7 7/22/2004
CarnegieMellon
Project LISTEN
Modify tutor to get mineable data
Log operations at grain size and level of interest Click <x, y> at time t: motor control Click “Goldilocks”: item selection
Reify operations to log them analyzably Handwriting or speech typed input Freehand drawing graphical palette (Geometry Tutor) Free-form responses menu selection (Self 88) Natural language sentence starters (Goodman 03)
Time student and tutor actions Time allocation reflects motivation (ITS 02) Hasty responses indicate guessing (TICL 04) Latency reflects automaticity (TICL 04)
8 7/22/2004
CarnegieMellon
Project LISTEN
Modify tutor: add relevant data
Randomize tutorial decisions What skill to test, what help to give
Probe skills Assess cognitive development (Arroyo 00) Test vocabulary words (IJAIE 01) Insert automated comprehension questions (TICL 04)
Import student data Gender, age, IQ (Shute 96) Prior knowledge (Corbett 00) Pretest scores (TICL 04)
Hand-label when appropriate Transcribe (some) spoken input (FLET 04)
9 7/22/2004
CarnegieMellon
Project LISTEN
Modify tutor: an example
Randomize: explain some new words but not others. Probe: test each new word the next day.
Did kids do better on explained vs. unexplained words? Overall: NO; 38% 36%, N = 3,171 trials (IJAIE 01). Rare, 1-sense words tested 1-2 days later: YES! 44% >> 26%, N = 189.
10 7/22/2004
CarnegieMellon
Project LISTEN
Map data stream to data set:structure data into a single type
Data stream: heterogeneous events over time Data set: elements with the same features
Segment into shorter episodes Tutorial action(s) + student response (Beck 00)
Slice into narrower strands Successive encounters of a specific word (AMLDP 98) Successive instances of a specific skill (learning curves)
Measure aggregated events Allocation of time among activities (ITS 02)
Formulate data as experimental trials Context where the trial occurred Decision made in this trial Outcome based on subsequent events
11 7/22/2004
CarnegieMellon
Project LISTEN
Data stream:
Map data stream to data set: Formulate data as experimental trials
Outcome: read fluently?
Decision (randomized)
Student clicks ‘read.’
‘I love to read stories.’
‘People sit down and …’
‘… read a book.’
Student is reading a story
Student needs help on a word
Tutor chooses what help to give
Student continues reading
Student sees word in a later sentence
Time passes…
Context:
12 7/22/2004
CarnegieMellon
Project LISTEN
Map data stream to data set: trials
Context: Decision: Outcome:Student_ID Target_WordHelp_Type Fluent …mwb6-5-1996-05-02 sink RhymesWith nofJH8-4-1994-11-01 gnaw StartsLike yesmDA5-5-1996-04-24 dirt Autophonics yesmST6-6-1994-01-25 people WordInContext yesmGH6-6-1990-10-01 breakfast SayWord nomJK4-5-1995-12-16 YOU Autophonics nofGA4-3-1995-10-25 home RhymesWith yesmBD7-9-1994-12-29 finally Recue yesmCD4-8-1996-03-06 Three OnsetRime yesfso5-8-1994-06-29 Stars OnsetRime yes(191,487 more trials)
13 7/22/2004
CarnegieMellon
Project LISTEN
Mine data set to make discoveries
Count outcome frequency Success rate of each help type (ICALL 04)
Fit a parametric model Knowledge tracing (Corbett 95)
Train a model Statistics, e.g. regression (TICL 04) Machine learning, e.g. decision trees (AIED 01)
14 7/22/2004
CarnegieMellon
Project LISTEN
Count outcome frequency: which help types worked best?
Same day: Later day:
Grade 1 words: Say In Context,
Onset Rime
Onset Rime
Grade 2 words: Say In Context, Rhymes With
Rhymes With
Grade 3 words: Say In Context Rhymes With, One Grapheme
Best: Rhymes With 69.2% ± 0.4% Worst: Recue 55.6% ± 0.4%
Compare within level to control for word difficulty.
Supplying the word helped best in the short term…But rhyming hints had longer lasting benefits.
15 7/22/2004
CarnegieMellon
Project LISTEN
Summary: modify, map, mine.
1. Modify tutor to make data mineable. Log, reify, time, hand-label, import, probe, randomize.
2. Map data streams to data sets. Segment, slice, measure.
3. Mine data set to make discoveries. Count, fit, train.
See videos, papers, etc. at www.cs.cmu.edu/~listen.
Thank you! Questions?
17 7/22/2004
CarnegieMellon
Project LISTEN
Structure of Reading Tutor database
Story EncounterList stories Pick stories
Sentence Encounter Read sentenceShow one sentence at a time
Word Encounter Read each word
Listens and helps
StudentReading Tutor
SessionLoginList readers
18 7/22/2004
CarnegieMellon
Project LISTEN
Map data stream to data set: formulate data as experimental trials
Context Decision Outcome
Student is stuck
Prompt or cough?
Next event in dialog
FF 2000
Before a new word
Explain it or not?
Test word next day
IJAIE 01
Click on word What help to give?
Word read OK next time?
SSSR 04
Context where the trial occurred Decision made in this trial Outcome based on subsequent events
19 7/22/2004
CarnegieMellon
Project LISTEN
Learning curves for students’ help requests
Try to predict subset Grade 1-2 level 1-6 prior encounters
Selected data 53 students 175,961 words 29,278 help requests
Train predictive model Count help requests 5x Predict other kids’ data 71% accuracy
20 7/22/2004
CarnegieMellon
Project LISTEN
Count outcome frequency(average success rate 66.1%)
Whole word: 24,841 Say In Context 56,791 Say Word
Decomposition: 6,280 Syllabify 14,223 Onset Rime 19,677 Sound Out 22,933 One Grapheme
Analogy: 13,165 Rhymes With 13,671 Starts Like
Semantic: 14,685 Recue 2,285 Show Picture 488 Sound Effect
Which types stood out? Best: Rhymes With 69.2% ± 0.4% Worst: Recue 55.6% ± 0.4%
Example: ‘People sit down and read a book.’