Comp. Genomics Recitation 10 Clustering and analysis of microarrays.
Comp. Genomics
description
Transcript of Comp. Genomics
Comp. Genomics
Recitation 614/11/06ML and EM
Outline
• Maximum likelihood estimation• HMM Example• EM• Baum-Welch algorithm
Maximum likelihood
• One of the methods for parameter estimation
• Likelihood: L=P(Data|Parameters)• Simple example:
• Simple coin with P(head)=p• 10 coin tosses• 6 heads, 4 tails
• L=P(Data|Params)=(106)p6 (1-p)4
Maximum likelihood
• We want to find p that maximizes L=(10
6)p6 (1-p)4
• Infi 1, Remember?• Log is a monotonically increasing
function, we can optimize logL=log[(10
6)p6 (1-p)4]=
log(106)+6logp+4log(1-p)]
• Deriving by p we get: 6/p-4/(1-p)=0• Estimate for p:0.6 (Makes sense?)
ML in Profile HMMs
• Emission probabilities
• Mi a
• Ii a
http://www.cs.huji.ac.il/~cbio/handouts/Class6.ppt
• Transition Probabilities• Mi Mi+1• Mi Di+1• Mi Ii
• Ii Mi+1• Ii Ii
• Di Di+1• Di Mi+1• Di Ii
Input: X1,…,Xn independent training sequencesGoal: estimation of = (A,E) (model parameters)Note: P(X1,…,Xn | ) = i=1…nP(Xi | ) (indep.)
l(x1,…,xn | ) = log P(X1,…,Xn | ) = i=1…nlog P(Xi | )
Parameter Estimation for HMMs
Case 1 - Estimation When State Sequence is Known: Akl = #(occurred kl transitions)
Ek(b) = #(emissions of symbol b that occurred in state k)
Max. Likelihood Estimators:
• akl = Akl / l’Akl’
• ek(b) = Ek(b) / b’Ek(b’)
small sample, orprior knowledge correction:A’kl = Akl + rkl
E’k(b) = Ek(b) + rk(b)
Example
http://www.cs.huji.ac.il/~cbio/handouts/Class6.ppt
• Suppose also that the “match” positions are marked...
• Suppose we are given the aligned sequences
AG---CA-AT-CAG-AA-
--AAACAG---C
*---**
Calculating A, E
count transitions and emissions:
0 1 2 3M-MM-DM-II-MI-DI-ID-MD-DD-I
0 1 2 3ACGT
ACTG
transitions emissions
http://www.cs.huji.ac.il/~cbio/handouts/Class6.ppt
AG---CA-AT-CAG-AA-
--AAACAG---C
*---**
Calculating A, E
count transitions and emissions:
0 1 2 3M-M 4 3 2 4M-D 1 1 0 0M-I 0 0 1 0I-M 0 0 2 0I-D 0 0 1 0I-I 0 0 4 0
D-M - 0 0 1D-D - 1 0 0D-I - 0 2 0
0 1 2 3A - 4 0 0C - 0 0 4G - 0 3 0T - 0 0 0
A 0 0 6 0C 0 0 0 0T 0 0 1 0G 0 0 0 0
transitions emissions
http://www.cs.huji.ac.il/~cbio/handouts/Class6.ppt
AG---CA-AT-CAG-AA-
--AAACAG---C
*---**
Estimating Maximum Likelihood probabilities using Fractions
0 1 2 3A - 1 0 0C - 0 0 1G - 0 1 0T - 0 0 0
A .25 .25 .86 .25C .25 .25 0 .25T .25 .25 .14 .25G .25 .25 0 .25
0 1 2 3A - 4 0 0C - 0 0 4G - 0 3 0T - 0 0 0
A 0 0 6 0C 0 0 0 0T 0 0 1 0G 0 0 0 0
emissions
http://www.cs.huji.ac.il/~cbio/handouts/Class6.ppt
0 1 2 3M-M 4 3 2 4M-D 1 1 0 0M-I 0 0 1 0I-M 0 0 2 0I-D 0 0 1 0I-I 0 0 4 0
D-M - 0 0 1D-D - 1 0 0D-I - 0 2 0
Estimating ML probabilities (contd)
0 1 2 3M-M .8 .75 .66 1.0M-D .2 .25 0 0M-I 0 0 .33 0I-M .33 .33 .28 .33I-D .33 .33 .14 .33I-I .33 .33 .57 .33
D-M - 0 0 1D-D - 1 0 0D-I - 0 1 0
transitions
http://www.cs.huji.ac.il/~cbio/handouts/Class6.ppt
EM - Mixture example
• Assume we are given heights of 100 individuals (men/women): y1…y100
• We know that:• The men’s heights are normally distributed
with (μm,σm)• The women’s heights are normally distributed
with (μw,σw)• If we knew the genders – estimation is
“easy” (How?)• What we don’t know the genders in our
data!• X1…,X100 are unknown• P(w),P(m) are unknown
Mixture example
• Our goal: estimate the parameters (μm,σm), (μn,σn), p(m)
• A classic “estimation with missing data”
• (In an HMM: we know the emmissions, but not the states!)
• Expectation-Maximization (EM):• Compute the “expected” gender for
every sample height• Estimate the parameters using ML• Iterate
EM
• Widely used in machine learning• Using ML for parameter estimation at
every iteration promises that the likelihood will consistently improve
• Eventually we’ll reach a local minima• A good starting point is important
Mixture example
• If we have a mixture of M gaussians, each with a probability αi and density θi=(μm,σm)
• Likelihood the observations (X):• The “incomplete-data” log-likelihood
of the sample x1,…,xN:
• Difficult to estimate directly…
Mixture example
• Now we introduce y1,…,y100: hidden variables telling us what Gaussian every sample came from
• If we knew y, the likelihood would be:
• Of course, we do not know the ys…• We’ll do EM, starting from
θg=(α1g ,..,αM
g, μ1g,..,μM
g,σ1g,.., σM
g)
Estimation
• Given θg, we can estimate the ys!• We want to find:• The expectation is over the states of
y• Bayes rule: P(X|Y)=P(Y|X)P(X)/P(Y):
Estimation
• We write down the Q:
• Daunting?
Estimation
• Simplifying:
• Now the Q becomes:
Maximization
• Now we want to find parameter estimates, such that:
• Infi 2, remember?• To impose the constraint Sum{αi}=1,
we introduce Lagrange multiplier λ:
• After summing both sides over l:
Maximization
• Estimating μig+1,σi
g+1 is more difficult
• Out of scope here• What turns out is actually quite
straightforward:
What you need to know about EM:
• When: If we want to estimate model parameters, and some of the data is “missing”
• Why: Maximizing likelihood directly is very difficult
• How:• Initial guess of the parameters• Finding a proper term for Q(θg, θg+1)• Deriving and finding ML estimators
EM estimation in HMMs
Input: X1,…,Xn independent training sequences Baum-Welch alg. (1972): Expectation:• compute expected # of kl state transitions:
P(i=k, i+1=l | X, ) = [1/P(x)]·fk(i)·akl·el(xi+1)·bl(i+1)
Akl = j[1/P(Xj)] · i fkj(i) · akl · el(xj
i+1) · blj(i+1)
• compute expected # of symbol b appearances in state kEk(b) = j[1/P(Xj)] · {i|xj
i=b} fkj(i) · bk
j(i) (ex.) Maximization:• re-compute new parameters from A, E using max.
likelihood.repeat (1)+(2) until improvement