Last time: Beta-Binomial
• Binary random variable: bent coin
2
Data Likelihood:
P (x1, x2, . . . , xn|✓H) = ✓#H
H(1� ✓H)#T
Last time: Beta-Binomial
• Binary random variable: bent coin
2
Data Likelihood:
P (x1, x2, . . . , xn|✓H) = ✓#H
H(1� ✓H)#T
Last time: Beta-Binomial
• Binary random variable: bent coin
2
Data Likelihood:
Prior (Beta distribution):
P (x1, x2, . . . , xn|✓H) = ✓#H
H(1� ✓H)#T
Last time: Beta-Binomial
• Binary random variable: bent coin
2
Data Likelihood:
Prior (Beta distribution):
P (x1, x2, . . . , xn|✓H) = ✓#H
H(1� ✓H)#T
Posterior:
P (✓H |↵,�, x1, . . . , xn) =1
B(↵+#H,� +#T )✓#H+↵�1(1� ✓)#T+��1
Last time: Beta-Binomial
• Binary random variable: bent coin
3
Maximum Likelihood:
Maximum a Posteriori:
✓MAP =
#H + ↵� 1
#T +#H + ↵+ � � 2
✓ML =
#H
#T +#H
K-Sided Dice
• Weighted • (Generalization of Bent Coin)
• Assume an observed sequence of rolls:1123213213
✓1 ✓2 ✓3
K-Sided Dice
• Weighted • (Generalization of Bent Coin)
• Assume an observed sequence of rolls:1123213213
✓1 ✓2 ✓3
P (x; ✓) = ✓x
Likelihood In General
• N Dice Rolls, K possible outcomes:
P (D|✓) =KY
k=1
✓Nkk
• Likelihood is a multivariable function
= f(✓1, ✓2, . . . , ✓K)
3D Probability Simplex• 3 parameters • Constraint that
parameters sum to 1
SK = {✓ : 0 ✓k 1,KX
k=1
✓k = 1}
3D Probability Simplex• 3 parameters • Constraint that
parameters sum to 1
SK = {✓ : 0 ✓k 1,KX
k=1
✓k = 1}
We want a probability distribution over this
Dirichlet distribution• Multivariate
generalization of Beta distribution
• Conjugate prior to multinomial
Dir(✓|↵) = 1
B(↵)
KY
k=1
✓↵k�1k 1(✓ 2 SK)
(log) Dirichlet distribution
α = <0.3,0.3,0.3> to <2.0, 2.0, 2.0>
Posterior
11
P (✓|D) / P (D|✓)P (✓)
/KY
k=1
✓Nkk ✓↵k�1
k =KY
k=1
✓Nk+↵k�1k
= Dir(✓|↵1 +N1, . . . ,↵K +NK)
Posterior
11
P (✓|D) / P (D|✓)P (✓)
/KY
k=1
✓Nkk ✓↵k�1
k =KY
k=1
✓Nk+↵k�1k
= Dir(✓|↵1 +N1, . . . ,↵K +NK)
Dirichlet is Conjugate to Multinomial
Naïve Bayes with Log Probabilities
• Q: Why don’t we have to worry about floating point underflow anymore?
Top Related