Latent Factor Models
description
Transcript of Latent Factor Models
![Page 1: Latent Factor Models](https://reader035.fdocuments.net/reader035/viewer/2022062803/5681463a550346895db347df/html5/thumbnails/1.jpg)
Latent Factor Models
Geoff GordonJoint work w/ Ajit Singh, Byron Boots,
Sajid Siddiqi, Nick Roy
![Page 2: Latent Factor Models](https://reader035.fdocuments.net/reader035/viewer/2022062803/5681463a550346895db347df/html5/thumbnails/2.jpg)
Motivation
A key component of a cognitive tutor: student cognitive model
Tracks what skills student currently knows—latent factors
circle-area
rectangle-area
decompose-area
right-answer
![Page 3: Latent Factor Models](https://reader035.fdocuments.net/reader035/viewer/2022062803/5681463a550346895db347df/html5/thumbnails/3.jpg)
Motivation
Student models are a key bottleneck in cognitive tutor authoring and performance
rough estimate: 20-80 hrs to hand-code model for 1 hr of content
result may be too simple, not rigorously verified
But, demonstrated improvements in learning from better models
E.g., Cen et al [2007]:12% less time to learn 6 geometry units (same retention) using tutor w/ more accurate model
This talk: automatic discovery of new models and data-driven revision of existing models via (latent) factor analysis
![Page 4: Latent Factor Models](https://reader035.fdocuments.net/reader035/viewer/2022062803/5681463a550346895db347df/html5/thumbnails/4.jpg)
SCORE: STDNT I, ITEM J
Simple case: snapshot, no side information
1 2 3 4 5 6 …
A 1 1 0 0 1 0 …
B 0 1 1 0 0 0 …
C 1 1 0 1 1 0 …
D 1 0 0 1 1 0 …
… … … … … … … …
ITEMS
STUDENTS
![Page 5: Latent Factor Models](https://reader035.fdocuments.net/reader035/viewer/2022062803/5681463a550346895db347df/html5/thumbnails/5.jpg)
Missing data
1 2 3 4 5 6 …
A 1 ? ? ? 1 0 …
B 0 ? 1 0 ? ? …
C 1 1 ? ? ? 0 …
D 1 0 0 1 ? ? …
… … … … … … … …
ITEMS
STUDENTS
![Page 6: Latent Factor Models](https://reader035.fdocuments.net/reader035/viewer/2022062803/5681463a550346895db347df/html5/thumbnails/6.jpg)
Data matrix X
xx11
xx22
xx33
..
..
..xxnn
STUDENTS
ITEMS
![Page 7: Latent Factor Models](https://reader035.fdocuments.net/reader035/viewer/2022062803/5681463a550346895db347df/html5/thumbnails/7.jpg)
Simple case: model
XX
VV
UU
U: student latent factorsV: item latent factorsX: observed performance
n students
m items k latent factors
k latent factors
observed
unobserved
![Page 8: Latent Factor Models](https://reader035.fdocuments.net/reader035/viewer/2022062803/5681463a550346895db347df/html5/thumbnails/8.jpg)
Linear-Gaussian version
student factoritem factor
XX
VV
UU
n students
m items k latent factors
k latent factors
U: Gaussian (0 mean, fixed var)V: Gaussian (0 mean, fixed var)X: Gaussian (fixed var, mean at left)
![Page 9: Latent Factor Models](https://reader035.fdocuments.net/reader035/viewer/2022062803/5681463a550346895db347df/html5/thumbnails/9.jpg)
Matrix form: Principal Components Analysis
xx11
xx22
xx33
..
..
..xxnn
DATA MATRIX X
≈
COMPRESSED MATRIX U
uu11
uu22
uu33
..
..
..uunn
vv11 …… vvkk
BASIS MATRIX VT
![Page 10: Latent Factor Models](https://reader035.fdocuments.net/reader035/viewer/2022062803/5681463a550346895db347df/html5/thumbnails/10.jpg)
PCA: the picture
![Page 11: Latent Factor Models](https://reader035.fdocuments.net/reader035/viewer/2022062803/5681463a550346895db347df/html5/thumbnails/11.jpg)
PCA: matrix form
xx11
xx22
xx33
..
..
..xxnn
DATA MATRIX X
≈
COMPRESSED MATRIX U
uu11
uu22
uu33
..
..
..uunn
vv11 …… vvkk
BASIS MATRIX VT
COLS OF V SPAN THE LOW-RANK SPACE
![Page 12: Latent Factor Models](https://reader035.fdocuments.net/reader035/viewer/2022062803/5681463a550346895db347df/html5/thumbnails/12.jpg)
Interpretation of factors
uu11
uu22
uu33
..
..
..uunn
vv11 …… vvkk
STUDENTS
ITEMSBASIS WEIGHTS
BASIS VECTORS
BASIS VECTORS ARE CANDIDATE “SKILLS” OR “KNOWLEDGE COMPONENTS”
WEIGHTS ARE STUDENTS’ KNOWLEDGE LEVELS
![Page 13: Latent Factor Models](https://reader035.fdocuments.net/reader035/viewer/2022062803/5681463a550346895db347df/html5/thumbnails/13.jpg)
PCA is a widely successful model
FACE IMAGES FROM Groundhog Day, EXTRACTED BY CAMBRIDGE FACE DB PROJECT
![Page 14: Latent Factor Models](https://reader035.fdocuments.net/reader035/viewer/2022062803/5681463a550346895db347df/html5/thumbnails/14.jpg)
Data matrix: face images
xx11
xx22
xx33
..
..
..xxnn
IMAGES
PIXELS
![Page 15: Latent Factor Models](https://reader035.fdocuments.net/reader035/viewer/2022062803/5681463a550346895db347df/html5/thumbnails/15.jpg)
Result of factoring
uu11
uu22
uu33
..
..
..uunn
vv11 …… vvkk
IMAGES
PIXELSBASIS WEIGHTS
BASIS VECTORS
BASIS VECTORS ARE OFTEN CALLED “EIGENFACES”
![Page 16: Latent Factor Models](https://reader035.fdocuments.net/reader035/viewer/2022062803/5681463a550346895db347df/html5/thumbnails/16.jpg)
Eigenfaces
IMAGE CREDIT: AT&T LABS CAMBRIDGE
![Page 17: Latent Factor Models](https://reader035.fdocuments.net/reader035/viewer/2022062803/5681463a550346895db347df/html5/thumbnails/17.jpg)
PCA: the good
Unsupervised: need no human labels of latent state!
No worry about “expert blind spot”
Of course, labels helpful if available
Post-hoc human interpretation of latents is nice too—e.g., intervention design
![Page 18: Latent Factor Models](https://reader035.fdocuments.net/reader035/viewer/2022062803/5681463a550346895db347df/html5/thumbnails/18.jpg)
PCA: the bad
Linear, Gaussian
PCA assumes E(X) is linear in UV
PCA assumes (X–E(X)) is i.i.d. Gaussian
![Page 19: Latent Factor Models](https://reader035.fdocuments.net/reader035/viewer/2022062803/5681463a550346895db347df/html5/thumbnails/19.jpg)
Nonlinearity: conjunctive skills
P(CORRECT)
SKILL 1SKILL 2
![Page 20: Latent Factor Models](https://reader035.fdocuments.net/reader035/viewer/2022062803/5681463a550346895db347df/html5/thumbnails/20.jpg)
Nonlinearity: disjunctive skills
P(CORRECT)
SKILL 1SKILL 2
![Page 21: Latent Factor Models](https://reader035.fdocuments.net/reader035/viewer/2022062803/5681463a550346895db347df/html5/thumbnails/21.jpg)
Nonlinearity: “other”P(CORRECT)
SKILL 1SKILL 2
![Page 22: Latent Factor Models](https://reader035.fdocuments.net/reader035/viewer/2022062803/5681463a550346895db347df/html5/thumbnails/22.jpg)
Non-Gaussianity
Typical hand-developed skill-by-item matrix
1 2 3 4 5 6 …
1 1 0 0 1 1 …
0 0 1 1 0 1 …
SKILLS
ITEMS
![Page 23: Latent Factor Models](https://reader035.fdocuments.net/reader035/viewer/2022062803/5681463a550346895db347df/html5/thumbnails/23.jpg)
Result of Gaussian assumption
true recovered
rows of true and recovered V matrices
![Page 24: Latent Factor Models](https://reader035.fdocuments.net/reader035/viewer/2022062803/5681463a550346895db347df/html5/thumbnails/24.jpg)
Result of Gaussian assumption
true recovered
rows of true and recovered V matrices
![Page 25: Latent Factor Models](https://reader035.fdocuments.net/reader035/viewer/2022062803/5681463a550346895db347df/html5/thumbnails/25.jpg)
The ugly: MLE only
PCA yields maximum-likelihood estimate
Good, right?
sadly, the usual reasons to want the MLE don’t apply here
e.g., consistency: variance and bias of estimates of U and V do not approach 0 (unless #items/student and #students/item )
Result: MLE is typically far too confident of itself
![Page 26: Latent Factor Models](https://reader035.fdocuments.net/reader035/viewer/2022062803/5681463a550346895db347df/html5/thumbnails/26.jpg)
Too certain: example
Learned coefficients
(e.g., a row of U)
Predictions
![Page 27: Latent Factor Models](https://reader035.fdocuments.net/reader035/viewer/2022062803/5681463a550346895db347df/html5/thumbnails/27.jpg)
Result: “fold-in problem”
Nonsensical results when trying to apply learned model to a new student or item
Similar to overfitting problem in supervised learning: confident-but-wrong parameters do not generalize to new examples
Unlike overfitting, fold-in problem doesn’t necessarily go away with more data
![Page 28: Latent Factor Models](https://reader035.fdocuments.net/reader035/viewer/2022062803/5681463a550346895db347df/html5/thumbnails/28.jpg)
Summary: 3 problems w/ PCA
Can’t handle nonlinearity
Can’t handle non-Gaussian distributions
Uses MLE only (==> fold-in problem)
Let’s look at each problem in turn
![Page 29: Latent Factor Models](https://reader035.fdocuments.net/reader035/viewer/2022062803/5681463a550346895db347df/html5/thumbnails/29.jpg)
Nonlinearity
In PCA, had Xij ≈ Ui ⋅ Vj
What if
Xij ≈ exp(Ui ⋅ Vj)
Xij ≈ logit(Ui ⋅ Vj)
…
![Page 30: Latent Factor Models](https://reader035.fdocuments.net/reader035/viewer/2022062803/5681463a550346895db347df/html5/thumbnails/30.jpg)
Non-Gaussianity
In PCA, had Xij ∼ Normal(μ), μ = Ui ⋅ Vj
What if
Xij ∼ Poisson(μ)
Xij ∼ Binomial(p)
…
![Page 31: Latent Factor Models](https://reader035.fdocuments.net/reader035/viewer/2022062803/5681463a550346895db347df/html5/thumbnails/31.jpg)
Exponential family review
Exponential family of distributions:
P(X | θ) = P0(X) exp(X⋅θ – G(θ))
G(θ) is always strictly convex, differentiable on interior of domain
• means G’ is strictly monotone (strictly generalized monotone in 2D or higher)
![Page 32: Latent Factor Models](https://reader035.fdocuments.net/reader035/viewer/2022062803/5681463a550346895db347df/html5/thumbnails/32.jpg)
Exponential family review
Exponential family PDF:
P(X | θ) = P0(X) exp(X⋅θ – G(θ))
• Surprising result: G’(θ) = g(θ) = E(X | θ)
• g & g–1 = “link function”
• θ = “natural parameter”
• E(X | θ) = “expectation parameter”
![Page 33: Latent Factor Models](https://reader035.fdocuments.net/reader035/viewer/2022062803/5681463a550346895db347df/html5/thumbnails/33.jpg)
Examples
Normal(mean)
g = identity
Poisson(log rate)
g = exp
Binomial(log odds)
g = sigmoid
![Page 34: Latent Factor Models](https://reader035.fdocuments.net/reader035/viewer/2022062803/5681463a550346895db347df/html5/thumbnails/34.jpg)
Nonlinear & non-Gaussian
Let P(X | θ) be an exponential family with natural parameter θ
Predict Xij ∼ P(X | θij), where θij = Ui ⋅ Vj
e.g., in Poisson, E(Xij) = exp(θij)
e.g., in Binomial, E(Xij) = logit(θij)
![Page 35: Latent Factor Models](https://reader035.fdocuments.net/reader035/viewer/2022062803/5681463a550346895db347df/html5/thumbnails/35.jpg)
Optimization problem
max ∑ log P(Xij | θij)
s.t. θij = Ui ⋅ Vj
• “Generalized linear” or “exponential family” PCA
• all P(…) terms are exponential families
• analogy to GLMs
+ log P(U) + log P(V)U,V
[Collins et al, 2001][Gordon, 2002][Roy & Gordon, 2005]
![Page 36: Latent Factor Models](https://reader035.fdocuments.net/reader035/viewer/2022062803/5681463a550346895db347df/html5/thumbnails/36.jpg)
Special cases
PCA, probabilistic PCA
Poisson PCA
k-means clustering
Max-margin matrix factorization (MMMF)
Almost: pLSI, pHITS, NMF
![Page 37: Latent Factor Models](https://reader035.fdocuments.net/reader035/viewer/2022062803/5681463a550346895db347df/html5/thumbnails/37.jpg)
Comparison to AFM
p = probability correct
θ = student overall performance
β = skill difficulty
Q = item x skill matrix
γ = skill practice slope
T = number of practice opportunities
TTikik γkkθ
β0
11
xx
![Page 38: Latent Factor Models](https://reader035.fdocuments.net/reader035/viewer/2022062803/5681463a550346895db347df/html5/thumbnails/38.jpg)
Theorem
• In GL PCA, finding U which maximizes likelihood (holding V fixed) is a convex optimization problem
• And, finding best V (holding U fixed) is a convex problem
• Further, Hessian is block diagonal
So, an efficient and effective optimization algorithm: alternately improve U and V
![Page 39: Latent Factor Models](https://reader035.fdocuments.net/reader035/viewer/2022062803/5681463a550346895db347df/html5/thumbnails/39.jpg)
Example: compressing
histograms w/ Poisson PCA
Points: observed frequencies in ℝ3
Hidden manifold: a 1-parameter family of multinomials
A
B C
![Page 40: Latent Factor Models](https://reader035.fdocuments.net/reader035/viewer/2022062803/5681463a550346895db347df/html5/thumbnails/40.jpg)
Example
ITERATION 1
![Page 41: Latent Factor Models](https://reader035.fdocuments.net/reader035/viewer/2022062803/5681463a550346895db347df/html5/thumbnails/41.jpg)
Example
ITERATION 2
![Page 42: Latent Factor Models](https://reader035.fdocuments.net/reader035/viewer/2022062803/5681463a550346895db347df/html5/thumbnails/42.jpg)
Example
ITERATION 3
![Page 43: Latent Factor Models](https://reader035.fdocuments.net/reader035/viewer/2022062803/5681463a550346895db347df/html5/thumbnails/43.jpg)
Example
ITERATION 4
![Page 44: Latent Factor Models](https://reader035.fdocuments.net/reader035/viewer/2022062803/5681463a550346895db347df/html5/thumbnails/44.jpg)
Example
ITERATION 5
![Page 45: Latent Factor Models](https://reader035.fdocuments.net/reader035/viewer/2022062803/5681463a550346895db347df/html5/thumbnails/45.jpg)
Example
ITERATION 9
![Page 46: Latent Factor Models](https://reader035.fdocuments.net/reader035/viewer/2022062803/5681463a550346895db347df/html5/thumbnails/46.jpg)
Remaining problem: MLE
Well-known rule of thumb: if MLE gets you in trouble due to overfitting, move to fully-Bayesian inference
Typical problem: computation
In our case, the computation is just fine if we’re a little clever
Additional wrinkle: switch to hierarchical model
![Page 47: Latent Factor Models](https://reader035.fdocuments.net/reader035/viewer/2022062803/5681463a550346895db347df/html5/thumbnails/47.jpg)
Bayesian hierarchical exponential-family PCA
XX
VV
UU
U: student latent factorsV: item latent factorsX: observed performanceR: shared prior for student latentsS: shared prior for item latents
n students
m items
k latent factors
k latent factors
observed
unobserved
RR
SS
student factoritem factor
![Page 48: Latent Factor Models](https://reader035.fdocuments.net/reader035/viewer/2022062803/5681463a550346895db347df/html5/thumbnails/48.jpg)
A little clever: MCMC
Z P(X)
![Page 49: Latent Factor Models](https://reader035.fdocuments.net/reader035/viewer/2022062803/5681463a550346895db347df/html5/thumbnails/49.jpg)
Experimental comparisonGeometry Area 1996-1997
data
Geometry tutor: 139 items presented to 59 students
On average, each student tested on 60 items
![Page 50: Latent Factor Models](https://reader035.fdocuments.net/reader035/viewer/2022062803/5681463a550346895db347df/html5/thumbnails/50.jpg)
Results: hold-out error
Embedding dimension for *EPCA is K = 15
credit: Ajit Singh
![Page 51: Latent Factor Models](https://reader035.fdocuments.net/reader035/viewer/2022062803/5681463a550346895db347df/html5/thumbnails/51.jpg)
Extensions
Relational models
Temporal models
![Page 52: Latent Factor Models](https://reader035.fdocuments.net/reader035/viewer/2022062803/5681463a550346895db347df/html5/thumbnails/52.jpg)
Relational models
1 2 3 4 5 6john
1 1 0 0 1 0
sue 0 1 1 0 0 0
tom 1 1 0 1 1 0
ITEMS
STUDENTS
1 2 3 4 5 6trig 1 1 0 0 1 0
story
0 1 1 0 0 0
hard 1 1 0 1 1 0
ITEMS
TAGS
![Page 53: Latent Factor Models](https://reader035.fdocuments.net/reader035/viewer/2022062803/5681463a550346895db347df/html5/thumbnails/53.jpg)
Relational hierarchical Bayesian exponential-family
PCA
XX
VV
UU
X, Y: observed dataU: student latent factorsV: item latent factorsZ: tag latent factorsR, S, T: shared priors
n students
m items
k latent factors
k latent factors
observed
unobserved
RR
SS
p tags
YY
ZZk latent factors
TT
X ≈ f(UVT) Y ≈ g(VZT)
![Page 54: Latent Factor Models](https://reader035.fdocuments.net/reader035/viewer/2022062803/5681463a550346895db347df/html5/thumbnails/54.jpg)
Example: brain imaging
2000 dictionary words
60 stimulus words
500 brain voxels
X = co-occurrence of (dictionary word, stimulus word) on web
Y = activation of voxel when presented with stimulus
Task: predict X
HB-EPCA
H-EPCA
EPCA
Relational versions
Mean squared error
credit: Ajit Singh
![Page 55: Latent Factor Models](https://reader035.fdocuments.net/reader035/viewer/2022062803/5681463a550346895db347df/html5/thumbnails/55.jpg)
Temporal models
So far: latent factors of students and content
e.g., knowledge components
for student: skill at KC
for problem: need for KC
e.g., student affect
But limited idea of evolution through time
e.g., fixed-structure models: proficiency = a + B X, where x = # practice opportunities, A = initial skill level, b = skill learning rate
![Page 56: Latent Factor Models](https://reader035.fdocuments.net/reader035/viewer/2022062803/5681463a550346895db347df/html5/thumbnails/56.jpg)
Temporal models
For evolving factors, we expect far better results if we learn about time explicitly
learning curves, gaming state, affective state, motivational state, self-efficacy, …
XX11XX11XX11LATENT STATE
PROPERTIES OF TRANSACTION
X1X1X1X1YY11
INSTRUCTIONAL DECISIONS X1X1X1X1UU11
TRANS. 1 TRANS. 2 TRANS. 3
XX11XX11XX22
X1X1X1X1YY22
X1X1X1X1UU22
XX11XX11XX33
X1X1X1X1YY33
X1X1X1X1UU33
![Page 57: Latent Factor Models](https://reader035.fdocuments.net/reader035/viewer/2022062803/5681463a550346895db347df/html5/thumbnails/57.jpg)
Example: Bayesian Evaluation & Assessment
[BECK ET AL., 2008]
PROPERTIES OF TRANSACTIONS
LATENT STATE
INSTRUCTIONAL DECISIONS
![Page 58: Latent Factor Models](https://reader035.fdocuments.net/reader035/viewer/2022062803/5681463a550346895db347df/html5/thumbnails/58.jpg)
The hope
Fit a temporal model
Examine learned parameters and latent states
Discover important evolving factors which affect performance
learning curve, affective state, gaming state, …
Discover how they evolve
![Page 59: Latent Factor Models](https://reader035.fdocuments.net/reader035/viewer/2022062803/5681463a550346895db347df/html5/thumbnails/59.jpg)
The hope
Reduce assumptions about what the factors are
Explore a wider variety of models
Model search guided by data
⇒ discover factors we might otherwise have missed
![Page 60: Latent Factor Models](https://reader035.fdocuments.net/reader035/viewer/2022062803/5681463a550346895db347df/html5/thumbnails/60.jpg)
Walking: original data
QuickTime™ and a decompressor
are needed to see this picture.
THANKS: BYRON BOOTS, SAJID
SIDDIQI
![Page 61: Latent Factor Models](https://reader035.fdocuments.net/reader035/viewer/2022062803/5681463a550346895db347df/html5/thumbnails/61.jpg)
Walking: original data
THANKS: BYRON BOOTS, SAJID
SIDDIQI
XX11XX11XX11LATENT STATE
JOINT ANGLESX1X1X1X1YY11
DESIREDDIRECTION X1X1X1X1UU11
TRANS. 1 TRANS. 2 TRANS. 3
XX11XX11XX22
X1X1X1X1YY22
X1X1X1X1UU22
XX11XX11XX33
X1X1X1X1YY33
X1X1X1X1UU33
![Page 62: Latent Factor Models](https://reader035.fdocuments.net/reader035/viewer/2022062803/5681463a550346895db347df/html5/thumbnails/62.jpg)
Walking: learned model
QuickTime™ and a decompressor
are needed to see this picture.
![Page 63: Latent Factor Models](https://reader035.fdocuments.net/reader035/viewer/2022062803/5681463a550346895db347df/html5/thumbnails/63.jpg)
Steam: original data
QuickTime™ and a decompressor
are needed to see this picture.
![Page 64: Latent Factor Models](https://reader035.fdocuments.net/reader035/viewer/2022062803/5681463a550346895db347df/html5/thumbnails/64.jpg)
Steam: original data
XX11XX11XX11LATENT STATE
PIXELSX1X1X1X1YY11
(EMPTY) X1X1X1X1UU11
TRANS. 1 TRANS. 2 TRANS. 3
XX11XX11XX22
X1X1X1X1YY22
X1X1X1X1UU22
XX11XX11XX33
X1X1X1X1YY33
X1X1X1X1UU33
![Page 65: Latent Factor Models](https://reader035.fdocuments.net/reader035/viewer/2022062803/5681463a550346895db347df/html5/thumbnails/65.jpg)
Steam: learned model
QuickTime™ and a decompressor
are needed to see this picture.