Recitation on EM slides taken from: ambuj/Courses/bioinformatics/EM.pdf...

24
Recitation on EM slides taken from: http://www.cs.ucsb.edu/~ambuj/Courses/bioinforma tics/EM.pdf Computational Genomics Recitation #6

Transcript of Recitation on EM slides taken from: ambuj/Courses/bioinformatics/EM.pdf...

Page 1: Recitation on EM slides taken from: ambuj/Courses/bioinformatics/EM.pdf ambuj/Courses/bioinformatics/EM.pdf.

Recitation on EMslides taken from:

http://www.cs.ucsb.edu/~ambuj/Courses/bioinformatics/EM.pdf

Computational GenomicsRecitation #6

Page 2: Recitation on EM slides taken from: ambuj/Courses/bioinformatics/EM.pdf ambuj/Courses/bioinformatics/EM.pdf.
Page 3: Recitation on EM slides taken from: ambuj/Courses/bioinformatics/EM.pdf ambuj/Courses/bioinformatics/EM.pdf.

All EM questions are in the format:

1. Write the likelihood function.2. Write the Q function.3. Derive the update rule.

Page 4: Recitation on EM slides taken from: ambuj/Courses/bioinformatics/EM.pdf ambuj/Courses/bioinformatics/EM.pdf.
Page 5: Recitation on EM slides taken from: ambuj/Courses/bioinformatics/EM.pdf ambuj/Courses/bioinformatics/EM.pdf.

Estimation problems

Page 6: Recitation on EM slides taken from: ambuj/Courses/bioinformatics/EM.pdf ambuj/Courses/bioinformatics/EM.pdf.

Estimation problems

What is the unobserved data in this case?

Page 7: Recitation on EM slides taken from: ambuj/Courses/bioinformatics/EM.pdf ambuj/Courses/bioinformatics/EM.pdf.

Estimation problems

Page 8: Recitation on EM slides taken from: ambuj/Courses/bioinformatics/EM.pdf ambuj/Courses/bioinformatics/EM.pdf.

?

?

?

Page 9: Recitation on EM slides taken from: ambuj/Courses/bioinformatics/EM.pdf ambuj/Courses/bioinformatics/EM.pdf.
Page 10: Recitation on EM slides taken from: ambuj/Courses/bioinformatics/EM.pdf ambuj/Courses/bioinformatics/EM.pdf.

?

?

?

Page 11: Recitation on EM slides taken from: ambuj/Courses/bioinformatics/EM.pdf ambuj/Courses/bioinformatics/EM.pdf.
Page 12: Recitation on EM slides taken from: ambuj/Courses/bioinformatics/EM.pdf ambuj/Courses/bioinformatics/EM.pdf.

?

?

?

?

??

?

?

?

Page 13: Recitation on EM slides taken from: ambuj/Courses/bioinformatics/EM.pdf ambuj/Courses/bioinformatics/EM.pdf.

?

??

Page 14: Recitation on EM slides taken from: ambuj/Courses/bioinformatics/EM.pdf ambuj/Courses/bioinformatics/EM.pdf.
Page 15: Recitation on EM slides taken from: ambuj/Courses/bioinformatics/EM.pdf ambuj/Courses/bioinformatics/EM.pdf.
Page 16: Recitation on EM slides taken from: ambuj/Courses/bioinformatics/EM.pdf ambuj/Courses/bioinformatics/EM.pdf.
Page 17: Recitation on EM slides taken from: ambuj/Courses/bioinformatics/EM.pdf ambuj/Courses/bioinformatics/EM.pdf.
Page 18: Recitation on EM slides taken from: ambuj/Courses/bioinformatics/EM.pdf ambuj/Courses/bioinformatics/EM.pdf.

EM question

• Let G = (G1, … , Gn) be n contiguous DNA regions representing genes. For each Gi we define the mRNA concentration of the gene as Pi, s.t. their sum is equal to 1. P = (P1, … , Pn) can be interpreted as the normalized expression levels for the regions in G.

Page 19: Recitation on EM slides taken from: ambuj/Courses/bioinformatics/EM.pdf ambuj/Courses/bioinformatics/EM.pdf.

EM question

• Our model assumes that reads are generated by randomly picking a region R from G according to the distribution P, and then copying this region. The copying process is error-prone. This process is repeated until we have a set of m reads R = r1, … , rm generated according to the model described above.

Page 20: Recitation on EM slides taken from: ambuj/Courses/bioinformatics/EM.pdf ambuj/Courses/bioinformatics/EM.pdf.

EM question

• For each region Gj and read ri, we have a probability pij = P(rj | Gi), the probability of observing rj given that the locus of the read was gene Gi. In practice, for each read rj, this probability will be close to zero for all but a few regions.

Page 21: Recitation on EM slides taken from: ambuj/Courses/bioinformatics/EM.pdf ambuj/Courses/bioinformatics/EM.pdf.

Likelihood function

• Write the likelihood of observing the m reads.

?

Page 22: Recitation on EM slides taken from: ambuj/Courses/bioinformatics/EM.pdf ambuj/Courses/bioinformatics/EM.pdf.

Q function

• Write the Q(P | P(t)) term.

?

?

Page 23: Recitation on EM slides taken from: ambuj/Courses/bioinformatics/EM.pdf ambuj/Courses/bioinformatics/EM.pdf.

M-step

• Write the M-step term using argmax function.

Page 24: Recitation on EM slides taken from: ambuj/Courses/bioinformatics/EM.pdf ambuj/Courses/bioinformatics/EM.pdf.

Update rule

• Infer from c the update step for P.

When we want to maximize ∑iailog(Pi) based on Pi, we achieve the maximum Pi=ai/∑iai

?