CS 312: Algorithm Design & Analysis Lecture #24: Optimality, Gene Sequence Alignment This work is...
-
Upload
holly-singleton -
Category
Documents
-
view
218 -
download
1
Transcript of CS 312: Algorithm Design & Analysis Lecture #24: Optimality, Gene Sequence Alignment This work is...
CS 312: Algorithm Design & Analysis
Lecture #24: Optimality,
Gene Sequence Alignment
This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.
Slides by: Eric Ringger, with contributions from Mike Jones, Eric Mercer, Sean Warnick
Announcements
Homework #15 due now
Project #5: Gene Sequence Alignment Kick-off: today Read directions now Whiteboard experience: due Monday Early: Monday after mid-term exam Due: Wednesday after mid-term exam
Mid-term Exam Start preparing your one page of notes Must be prepared by you. No cutting and pasting.
Objectives
Revisit the main ideas behind Dynamic Programming
Define the optimality property for DP Develop the algorithm for gene sequence
alignment (or at least begin) Prepare for Project #5
Dynamic Programming
The six steps:1. Ask: am I solving an optimization problem?2. Devise a minimal description (address) for any problem
instance and sub-problem3. Divide problems into sub-problems: define the recurrence to
specify the relationship of problems to sub-problems4. Check that the optimality property holds: An optimal
solution to a problem is built from optimal solutions to sub-problems.
5. Store results – typically in a table – and re-use the solutions to sub-problems in the table as you build up to the overall solution.
6. Back-trace / analyze the table to extract the composition of the final solution.
Optimality Property
An optimal solution to a problem is built from optimal solutions to sub-problems.
The optimality property is a necessary condition for solving an optimization problem by DP! It allows us to store and re-use optimal results to
sub-problems.
Optimality
A
B C
E F G H I
D
J K
1
2
1
2
( ( ))
( ( ))( ) min ( max)
...
( ( ))nn
f optimalsolution child
f optimalsolution childoptimalsolution parent or
f optimalsolution child
Shortest Path
American Fork
Orem
Provo
Sundance
Geneva
20
1012
3
15
18
10
12
Goal: the shortest path from AF to Provo.
Does this problem exhibit the optimality property? Pair up. Discuss
Questions
Q. In general, do you know whichsub-problem solutions to use in advance?
A. No. So a very greedy algorithm is not an option. (But Dijkstra’s is.) Q: How does having a table of intermediate shortest path results help
find the shortest path from AF to Provo? A: Reuse those results for intermediate destinations as you try
different routes. Q. Do you have to reconsider alternative sub-optimal solutions for the
intermediate destinations? A. No
Thus,, the Optimality Property holds Therefore, the shortest path problem can be solved by DP.
American Fork
Orem
Provo
Sundance
Geneva
20
10
12
3
15
18
10
12
Optimality in Driving
The shortest route from American Fork to Provo passes through Orem.
Assume we have found this route.
Then what can we say about the shortest route from AF to Orem?
It follows that optimal route from AF to Provo.
Could it be otherwise?
A related problem
Now suppose you drive from AF to Orem as fast as you canon your way to Provo,
But you are limited by the gas in your tank.
Does the Optimality Property Hold?
AF Orem Provo
Goal: get to Provo in as little time as possible. No refueling.Does this problem (formulation) satisfy the optimality property or not? Why?
5/9
10/5
20/1
5/9
10/5
20/1
“takes 20 minutes using1 gallon of gas”
Start with 10 gallons
Problem Solving Advice
Start by asking: which sub-problems should be solved? If you know how to choose in advance using local
information only, then greedy might work.
Else if sub-problems don’t overlap, then divide and conquer would be a good choice.
Else if the optimality property holds, then DP is a good choice.
Else the optimality property does NOThold, so apply another strategy.
(Stay tuned for more guidance)
Important!
x=ACGCTGA y=ACTGT
Gene Sequence Alignment
Virtually Identical Problems
Edit Distance aka Levenshtein Distance
Sequence Alignment E.g., Gene Sequence Alignment
Fundamentally the same thing! We’re focusing on gene sequence
alignment.
Edit Distance / Sequence Alignment Problem
Given: 2 strings: and ; ;
Return: Smallest cost to transform string into string (or vice versa) Another perspective: smallest cost of aligning to (or vice versa)
Contrast the 2 perspectives.
Edit Distance / Sequence Alignment Problem
Given: 2 strings: and ; ;
Return: Smallest cost to transform string into string (or vice versa) Another perspective: smallest cost of aligning to (or vice versa)
x: ACGCT-Cy: A--CTGT
Alignment Example:
The ‘-’ is a “gap”
Edit Distance / Sequence Alignment Problem
Given: 2 strings: and ; ;
Return: Smallest cost to transform string into string (or vice versa) Another perspective: smallest cost of aligning to (or vice versa)
x: ACGCT-Cy: A--CTGT
Divide intoPairs
Edit Distance / Sequence Alignment Problem
Given: 2 strings: and ; ;
Return: Smallest cost to transform string into string (or vice versa) Another perspective: smallest cost of aligning to (or vice versa)
Cost: Type: Match; Cost = cmatch
x: ACGCT-Cy: A--CTGT
Each Pair hasa type and a cost
x: ACGCT-Cy: A--CTGT
Edit Distance / Sequence Alignment Problem
Given: 2 strings: and ; ;
Return: Smallest cost to transform string into string (or vice versa) Another perspective: smallest cost of aligning to (or vice versa)
Cost: Match: cmatch
Type: Insertion into x (= deletion from y) aka “indel”; Cost = cindel
x: ACGCT-Cy: A--CTGT
Edit Distance / Sequence Alignment Problem
Given: 2 strings: and ; ;
Return: Smallest cost to transform string into string (or vice versa) Another perspective: smallest cost of aligning to (or vice versa)
Cost: Match: cmatch
Insertion into x (= deletion from y): cindel
Insertion into y (= deletion from x): cindel
Edit Distance / Sequence Alignment Problem
Given: 2 strings: and ; ;
Return: Smallest cost to transform string into string (or vice versa) Another perspective: smallest cost of aligning to (or vice versa)
Cost: Match: cmatch
Insertion into x (= deletion from y): cindel
Insertion into y (= deletion from x): cindel
Type: Substitution of x into y (or from y into x); Cost = csub
x: ACGCT-Cy: A--CTGT
Edit Distance / Sequence Alignment Problem
Given: 2 strings: and ; ;
Return: Smallest cost to transform string into string (or vice versa) Another perspective: smallest cost of aligning to (or vice versa)
Cost: Match: cmatch
Insertion into x (= deletion from y): cindel
Insertion into y (= deletion from x): cindel
Substitution of x into y (or from y into x); Cost = csub
x: ACGCT-Cy: A--CTGT
Edit Distance / Sequence Alignment Problem
Given: 2 strings: and ; ;
Return: Smallest cost to transform string into string (or vice versa) Another perspective: smallest cost of aligning to (or vice versa)
Cost: Match: cmatch
Insertion into x (= deletion from y): cindel
Insertion into y (= deletion from x): cindel
Substitution of x into y (or from y into x); Cost = csub
x: ACGCT-Cy: A--CTGT
How would you solve this problem?
Solution Ideas Enumerate all and score
Pro: Easy to code Pro: Optimal Con: exponential
Greedy: work from left to right, gobbling up matches and inserting gaps or allowing substitutions as necessary Pro: Easy Pro: Linear = fast / efficient Con: not optimal
DP Pre-req: optimality property Pre-req: define addressable sub-problems Pre-req: determine relationship between problem and sub-problems Pro: Optimal Con: ?
Divide and Conquer?
Designing the DP Algorithm for Gene Sequence Alignment
DP?
Define each sub-problem to be the best score for aligning the first bases of sequence with the first bases of sequence
Does that suffice as a minimal description?
In those terms, what is our objective function? minimize
Can we divide this problem into sub-problems? How many? Hint: how many sub-problems are one step away from ?
Example: Sub-problems
x=ACGCTGA y=ACTGT
Example: Sub-problems
x=ACGCTGA y=ACTGT
To be continued in Lecture #25
Assignment
HW #16
Read Section 6.3, if you haven’t done so already.
Thursday: Screencast & Quiz