211 Dynamic Programming - Carnegie Mellon School of ...ab/Desktop/15-211...
Transcript of 211 Dynamic Programming - Carnegie Mellon School of ...ab/Desktop/15-211...
211
Dynamic Programming
Klaus Sutner
Carnegie Mellon University
www.cs.cmu.edu/~sutner
Feb. 10, 2005
CDM Dyn. Prg. 1
Battleplan
• D. & C. Failure
• Overlapping Subinstances
• Memoizing
• Explicit Tables
• Scheduling
• Longest Common Subsequences
Reading: http://mat.gsia.cmu.edu/classes/dynamic/dynamic.html
Google for dynamic programming
CDM Dyn. Prg. 2
(Bad) Example: Fibonacci Numbers
The usual definition of the Fibonacci numbers uses recursion:
F (0) = 0
F (1) = 1
F (n) = F (n− 1) + F (n− 2) n ≥ 2
This definition translates directly into code:
nat fib( n : nat ) {if( n <= 1 ) return n;return fib(n-1) + fib(n-2);
}
Correctness is obvious (disregarding problems with machine sized integers).
CDM Dyn. Prg. 3
Failed Divide and Conquer
The program is reminiscent of divide-and-conquer: the instance n is broken up into two
smaller sub-instances n− 1 and n− 2.
We solve the two sub-instances, then combine the partial solutions (by adding them).
But there is a huge problem: the sub-instances are not independent.
E.g., both n− 1 and n− 2 lead to a sub-instance n− 3.
Since the computations don’t know about each other work is duplicated.
CDM Dyn. Prg. 4
A Call Tree
4
3 2
2 1 1 0
1
3
2 1
1 0
5
0
4
3 2
2 1 1 0
1 0
6
In fact, since only a call to fib(1) returns the positive result 1 there must be F (n) calls
to fib(1) in the call to fib(n). Unfortunately, this number is exponential in n. Also
note that the call stack has depth O(n).
CDM Dyn. Prg. 5
A Challenge
So the number of calls to fib(1) is exactly F (n).
But how many calls are there altogether?
Cn = number of calls in fib(n)
Clearly, C0 = C1 = 1 and
Cn = Cn−1 + Cn−2 + 1
for n ≥ 2. This looks very similar to the definition of the Fibonacci numbers themselves.
So one might suspect that the solution has something to do with Fibonacci numbers.
Is there a simple closed form for Cn?
CDM Dyn. Prg. 6
Compute Away
In the absence of good theory, the best way to tackle this type of problem is to compute a
few examples and hope there is a pattern.
In the next table, Sn =∑
i≤n Fn.
n Cn Fn Sn0 1 0 0
1 1 1 1
2 3 1 2
3 5 2 4
4 9 3 7
5 15 5 12
6 25 8 20
7 41 13 33
8 67 21 54
9 109 34 88
10 177 55 143
CDM Dyn. Prg. 7
A Conjecture
From the table, it looks very much like
Cn = Fn−1 + Sn
Exercise 1. Prove the conjecture. Try to simplify the expression as much as possible.
Exercise 2. Find a way to count the number of calls to fib(k) within a call to fib(n).
CDM Dyn. Prg. 8
Better Algorithm
Since F (n) depends only on two previous values, which depend on previous values, which
depend on previous values, . . . it makes perfect sense to bite the bullet compute the whole
sequence:
F (0), F (1), F (2), . . . , F (n− 2), F (n− 1), F (n)
If we fill the table left-to-right this requires
• time Θ(n) and
• space Θ(n).
Since F (i) depends only on F (i − 1) and F (i − 2) we can easily reduce the space
requirement to O(1) in this case.
CDM Dyn. Prg. 9
Slightly More Wacko Solution
Suppose we wish to adhere to the First Axiom of Computer Science: Don’t think unless
you absolutely must.
In this case, we would like to keep the orginal program text as nearly intact as possible.
No problem:
memoize nat fib( n : nat ) {if( n <= 1 ) return n;return fib(n-1) + fib(n-2);
}
Here memoize is a keyword that fails to exist in most current programming languages. But
it’s easy to explain:
CDM Dyn. Prg. 10
Memoize
First, build a hash table for fib.
This table is a global data structure maintained by the runtime environment. Initially the
table is empty.
Now suppose somewhere in the program there is a call fib(n). This is handled like so:
• First check the table if the result is already there. If so,
return the pre-computed result.
• Otherwise enter the function body and proceed with the
computation as usual. Store the result in the table when
done and return it.
Exercise 3. Figure out how memoizing could be implemented in a language like Java.
CDM Dyn. Prg. 11
Complexity?
Assuming that the hash table works well, a call will be O(n) in the worst case: we have
to create n entries in the table. Likewise, space will be O(n).
But if we ask for an already computed value the cost drops to O(1).
This can be a huge advantage in complicated computations.
Of course, for a one-shot computation the truncated table method is far better: there is
no overhead in creating, maintaining and destroying the hash table.
CDM Dyn. Prg. 12
Dynamic Programming
Definition. Dynamic programming refers to a type of algorithm that solves large problem
instances by systematically breaking them up into smaller sub-instances, solving these
separately and then combining the partial solutions.
The key difference to divide-and-conquer is that the sub-instances are not required to be
independent: two sub-instances may well contain the same sub-sub-instances.
Bookkeeping becomes an important task here: we have to keep track of all the sub-
instances and their associated solutions. There are two standard ways of doing this:
• Construct a table (usually one- or two-dimensional) for all
the sub-instances and their solutions, or
• use memoizing.
CDM Dyn. Prg. 13
Tables versus Memoizing
In the olden days, only the table method was referred to as dynamic programming.
Constructing an explicit table is usually more efficient, but slightly more difficult for the
programmer. There two key problems:
• Figuring out the correct boundary values, and
• filling in the table in the correct order.
Of course, if something like memoize is available, the memoizing approach is much easier
to implement.
Unless there are significant gaps in the table, though, memoizing will be slower than the
plain table method.
CDM Dyn. Prg. 14
Example: Scheduling
Here is a more realistic and interesting example.
Suppose you have a resource (such as a conference room) and requests for its use. The
requests have the form (t1, t2) where t1 < t2 ∈ N (the begin and end time of a meeting).
There are two opposing requirements:
• The resource is supposed to be as busy as possible, but
• no overlapping requests can be scheduled.
By overlap we mean that the two intervals have an interior point in common, so
(1, 5), (5, 7) would be admissible whereas (1, 5), (4, 6) would not.
So we want to find a non-overlapping schedule that maximizes the use of the resource.
CDM Dyn. Prg. 15
Example
Intervals are sorted by right endpoint. In this case, it’s easy to find an optimal schedule by
visual inspection.
CDM Dyn. Prg. 16
But How About This?
CDM Dyn. Prg. 17
A Solution
CDM Dyn. Prg. 18
Less Informally
The input is given as intervals I1, I2, . . . , In where
Ii = (ai, bi) ai < bi ∈ N.
Write `i = bi − ai for the length of interval Ii.
A schedule can now be defined as a set S ⊆ [n] of intervals subject to the constraint that
for i 6= j ∈ S intervals Ii and Ij do not overlap.
We need to maximize
val(S) =∑
i∈S
`i
over all schedules S.
This is an optimization problem and val is the objective function.
CDM Dyn. Prg. 19
Computing the Optimal Value
For the time being, we will focus on computing the value of the solution, rather than the
solution itself.
This approach is typical for optimization problems; once the optimal value has been found
it is often easy to construct a correspondin optimal solution in a second pass.
How about brute force?
This comes down to computing all subsets of [n], eliminating those that contain overlapping
intervals and finding one such set S that maximizes val(S).
Clearly a losing proposition: there are 2n subsets to consider and it is not clear how to
eliminate a significant portion thereof.
CDM Dyn. Prg. 20
The Dynamic Programming Approach
Recall that we are trying to break big instances up into smaller ones.
In this case, instead of looking at all n intervals, we could consider just the first n− 1, or
n− 2, n− 3 and so on.
For simplicity we write val(k) for the value of an optimal solution (not the solution itself,
just the sum) using only the intervals I1, I2, . . . , Ik.
To keep notation simple, let us assume that the Ii = (ai, bi) are already sorted by right
endpoint:
b1 ≤ b2 . . . . . . ≤ bn−1 ≤ bn.
If not, this can be ensured by a simple pre-processing step.
CDM Dyn. Prg. 21
The Trick
We need to express val(k) from val(i) for i < k (without much computation, needless to
say, we can’t just run some brute force subroutine).
How can an optimal solution for k differ from optimal solutions for i < k?
Fundamental Insight: Either we use Ik or we don’t.
If we don’t, then val(k) = val(k − 1).
If we do use Ik then we have to find i < k such that the optimal solution for i does not
overlap with Ik and add Ik to this optimal partial solution.
Since we don’t know wich is the right answer, we will compute the maximum over all
possible candidates.
CDM Dyn. Prg. 22
More Formally
To this end define the constraint c(k) to be
c(k) = max(
i < k∣
∣ Ii and Ik do not overlap)
Now we can compute val(k) as follows:
val(k) = max(val(c(k)) + `k, val(k − 1))
Exercise 4. Explain how this really implements our informal strategy.
CDM Dyn. Prg. 23
Implementation Details
How do we compute the c-array?
Recall that we have the intervals sorted by right endpoints.
Hence given ai we can perform binay search to find
max(j < i∣
∣ bj ≤ ai).
The total cost of this is O(n logn).
Exercise 5. Find an alternative algorithm to compute the c-array that starts by sort the
intervals according to their left endpoints.
CDM Dyn. Prg. 24
Efficiency
If the data are given in sorted order (in both ways), the pre-computation of c is linear,
otherwise we have to add the cost of sorting.
Once we have the c-array one can easily construct a table
val(1), val(2), . . . , val(n− 1), val(n)
using a single pass from left to right. Moreover, every new entry can be computed in
constant time, so this part of the algorithm is linear in n.
Once the table has been constructed we can now trace back the final answer val(n) to
obtain the corresponding actual solution S. The second pass is also linear time.
Exercise 6. Figure out how to construct an optimal solution rather than just the optimal
value.
CDM Dyn. Prg. 25
The Key Steps
• Find some suitable notion of sub-instance.
• Find a recursive way to express the value of a sub-instance in terms
of “smaller” sub-instances.
• Organize this recursive computation into a neat table, possibly with
some pre-computation. This is the bottom-up approach.
• Alternatively, use memoizing in a direct recursive top-down
approach.
Of course, one should try to keep the number of sub-instances that need to be inspected
small: O(1) is great but in general one often has to deal with O(n) sub-instances.
CDM Dyn. Prg. 26
Another Example: Longest Common Subsequences
Definition 1. Consider a sequence A = (a1, a2, . . . , an). B = (b1, b2, . . . , bk) is a
subsequence of A if there is an index sequence 1 ≤ p1 < p2 < . . . < pk ≤ n such that
api = bi.
Given two sequences A and B, S is a common subsequence (LCS) if S is a subsequence
of both A and B.
S is a longest common subsequence if it’s length is maximal among all common
subsequences.
So this is another optimization problem: maximize the length of a common subsequence.
Note: we are dealing with scattered subsequences here, the elements need not be
contiguous.
CDM Dyn. Prg. 27
Example
A = 3, 5, 4, 3, 1, 5, 4, 2, 4, 5, 3, 1, 3, 5, 2, 1, 5, 2, 1, 3
B = 4, 3, 5, 1, 5, 3, 1, 3, 3, 2, 2, 2, 5, 4, 4, 4, 4, 5, 4, 4
Claim: An LCS for A and B is
S = 3, 5, 1, 5, 3, 1, 3, 5, 5
How would we verify this?
CDM Dyn. Prg. 28
The Easy Part
It’s not hard to check that S really is a subsequence:
A = 3, 5, 4, 3, 1, 5, 4, 2, 4, 5, 3, 1, 3, 5, 2, 1, 5, 2, 1, 3
B = 4, 3, 5, 1, 5, 3, 1, 3, 3, 2, 2, 2, 5, 4, 4, 4, 4, 5, 4, 4
But it is far from clear that we have the longest possible subsequence.
Uniqueness is another interesting question.
CDM Dyn. Prg. 29
The Sub-Instances
So suppose we have sequences A = (a1, a2, . . . , am) and B = (b1, b2, . . . , bn).
What are the sub-instances that we need to consider?
Well, we could chop off the last item in a sequence. Presumably that would make it easier
to find a (partial) LCS.
Let’s write Ai be the prefix of A of length i:
Ai = (a1, a2, . . . , ai−1, ai)
and likewise Bj.
CDM Dyn. Prg. 30
Case Analysis
Suppose S = (s1, s2, . . . , sk) is a LCS of A and B.
We need to get a recursive description of LCSs, one that hopefully will translate into a
simple algorithm. Here is step one.
• am = bn implies
sk = am, Sk−1 is LCS of Am−1 and Bn−1,
• am 6= bn and sk 6= am implies
S is LCS of Am−1 and B,
• am 6= bn and sk 6= bn implies
S is LCS of A and Bn−1.
Not quite what we need since this formulation assumes knowledge of S, but we can turn
things around.
CDM Dyn. Prg. 31
The Recursion
Let a 6= b. Then in terminally sloppy notation
LCS(Aa,Ba) = LCS(A,B) a
LCS(Aa,Bb) = LCS(A,Bb) or LCS(Aa,B)
Read this with ample amounts of salt, LCS is not really a function and the “or” also needs
some positive thinking.
Exercise 7. Explain how to make sense out of these equations.
CDM Dyn. Prg. 32
The Optimal Value
As usual, it’s a bit easier to deal with the optimal value rather than the optimal solution:
in this case, let us write lcs(A,B) for the length of any LCS for A and B.
Then
lcs(Aa,Ba) = lcs(A,B) + 1
lcs(Aa,Bb) = max(lcs(A,Bb), lcs(Aa,B))
There is nothing fishy about these equations, they are literally correct.
If we were to adopt memoizing this is essentially a program to compute the optimal value.
Note the two recursive calls in the second case, without memoizing a recursion algorithm
is going to produce exponentially many calls.
CDM Dyn. Prg. 33
Summary
• Some problems cannot be handled by divide-and-conquer but seem to
require overlapping sub-instances.
• Dynamic programming produces less elegant but still quite respectable
algorithms.
• Can be implemented via explicit tables or via memoizing.
• Correctness is often far from obvious and may require a delicate argument.