maxsubseqsum

5
1 CSCI 3320/8325 Data Structures Module 2 First Algorithms for Analysis 2 The Maximum Subsequence Sum Problem To emphasize algorithm analysis, we will consider the problem of finding the maximum sum of a contiguous subsequence of integers from a given input data sequence. For example, if the input data sequence contains –2 11 –4 13 –5 –2 the answer is 20 (11 – 4 + 13 = 20). For convenience, we define the maximum subsequence sum as 0 if all the integers are negative. 3 The Solutions We will consider four different algorithms to solve this problem. Each algorithm will be explained, and the C++ code will be examined. We will also compute the worst case running time for each of the algorithms, and observe how relatively simple algorithms can yield enormous improvement in the running time. 4 Algorithm 1: Exhaustive Trials The most obvious way to solve this problem is to compute the sum of each possible subsequence, and retain the largest sum as the result. Each possible subsequence is identified by the subscripts of the starting and ending elements in the subsequence (assuming the entire sequence has been stored in an array). Thus a pair of nested loops will be used to generate all possible pairs of starting and ending subscripts. For each subsequence, we will also need a loop to compute the sum of the elements in the candidate subsequence. Note that we are not concerned with identifying the subsequence itself in this problem, only with obtaining its sum. 5 Algorithm 1: C++ Code int maxSubSum1(const vector<int> &a) { int maxsum = 0; // 1 for (int i=0; i<a.size(); i++) // 2 for (int j=i; j<a.size(); j++) { // 3 int thisSum = 0; // 4 for (int k=i; k<=j; k++) // 5 thisSum += a[k]; // 6 if (thisSum > maxsum) // 7 maxSum = thisSum; // 8 } return maxSum; // 9 } 6 Algorithm 1: Analysis The analysis of this algorithm is quite simple. It is easy to see that the maximum number of iterations of each loop (the worst case) is governed by the size of the sequence, N . There is just one statement (6) repeated inside the three nested loops (2, 3 and 5), and it has a constant, or O (1), running time. Thus, the worst case running time for this algorithm is O (N ? 3 ). While a more precise analysis can be done, it yields an expression that is still dominated by N 3 , and thus the worst case running time is not affected.

description

maxsubseqsum

Transcript of maxsubseqsum

  • 1CSCI 3320/8325Data Structures

    Module 2First Algorithms for Analysis

    2

    The Maximum Subsequence Sum Problem

    To emphasize algorithm analysis, we will consider the problem of finding the maximum sum of a contiguous subsequence of integers from a given input data sequence.For example, if the input data sequence contains

    2 11 4 13 5 2

    the answer is 20 (11 4 + 13 = 20).For convenience, we define the maximum subsequence sum as 0 if all the integers are negative.

    3

    The Solutions

    We will consider four different algorithms to solve this problem.Each algorithm will be explained, and the C++ code will be examined.We will also compute the worst case running time for each of the algorithms, and observe how relatively simple algorithms can yield enormous improvement in the running time.

    4

    Algorithm 1: Exhaustive TrialsThe most obvious way to solve this problem is to compute the sum of each possible subsequence, and retain the largest sum as the result.Each possible subsequence is identified by the subscripts of the starting and ending elements in the subsequence (assuming the entire sequence has been stored in an array). Thus a pair of nested loops will be used to generate all possible pairs of starting and ending subscripts.For each subsequence, we will also need a loop to compute the sum of the elements in the candidate subsequence.Note that we are not concerned with identifying the subsequence itself in this problem, only with obtaining its sum.

    5

    Algorithm 1: C++ Codeint maxSubSum1(const vector &a){

    int maxsum = 0; // 1for (int i=0; i

  • 27

    Algorithm 2: Eliminating a Loop

    A simple observation allows us to eliminate one of the nested loops.The sum computed by the loop on lines 5 and 6 includes just one more element that the previous calculation (for the same value of i).Thus we can eliminate the innermost loop and rewrite the code with only two nested loops.The running time of this improved algorithm is clearly O (N 2).

    8

    Algorithm 2: C++ Code

    int maxSubSum2 (const vector & a){

    int maxSum = 0;for (int i = 0; i < a.size(); i++) {

    int thisSum = 0;for (int j = i; j < a.size(); j++) {

    thisSum += a[j];if (thisSum > maxSum)

    maxSum = thisSum;}

    }return maxSum;

    }

    9

    Algorithm 3: A Recursive SolutionThe idea in this algorithm is to split the problem into two pieces, each approximately the same size, and solve them independently.Recall that the division of the size of a problem (usually by 2) typically results in a logarithmic worst case running time; that is the case with this solution.If we divide the sequence, then compute the maximum subsequence sum of each part, the result is, with one exception, just the maximum of the two resulting sums.The exception occurs when the maximum subsequence crosses the middle, and has elements in each of the two parts.

    10

    Algorithm 3: The Three Cases

    Consider this input:

    4 -3 5 -2 -1 2 6 -2

    Left Half Right Half

    The maximum of the left half is 6 (4 3 +5), and the maximum of the right half is 8 (2+6).The maximum sum in the left half that includes its rightmost element is 4 (4 3 +5 2); the maximum sum in the right half that includes its leftmost element is 7 (-1 +2 +6). The maximum sum that crosses the middle is thus just 11 (4 +7).

    11

    Algorithm 3: Putting It All TogetherThe solution has a recursive function that receives the entire array along with the subscripts of the left and right border elements.It checks first for the base case (just one element).

    If not the base case, the two parts of the sequence are checked recursively to obtain their maximum sums.

    The maximum sums of the left and right sequences that include the border elements are then added to obtain the maximum sum that crosses the middle.

    Finally, the maximum of these three sums is returned as the result.

    12

    Algorithm 3: C++ Code (Part 1)

    int maxSumRec (const vector &a, int l, int r){

    if (l == r) // base case: only one elementif (a[l] > 0) return a[l]; else return 0;

    int c = (l + r) / 2; // approximate centerint maxlsum = maxSumRec(a,0,c); // solve left partint maxrsum = maxSumRec(a,c+1,a.size()); // right part int lbsum=0, maxlbsum=0; // left border sumfor (int i=c; i>=l; i--) {

    lbsum += a[i];if (lbsum > maxlbsum) maxlbsum = lbsum;

    }

  • 313

    Algorithm 3: C++ Code (Part 2)

    int rbsum=0, maxrbsum=0; // right border sumfor (int i=c+1; i maxrbsum) maxrbsum = rbsum;

    }

    return max3(maxlsum, maxrsum, maxlbsum + maxrbsum);}

    int maxSubSum3 (const vector &a){

    return maxSumRec(a, 0, a.size()-1);}

    14

    Algorithm 3: AnalysisIf we consider only the base case for the recursion (a single element), it is clear that T (1) = O (1), since only one of two return statements is executed.If more than one element is in the subsequence, then we recursively invoke the function with the left and right halves of the original vector, each taking time T (N /2), then compute the sum of (potentially all) the elements in the vector, taking time O (N). Thus the total worst case running time for the recursive cases is

    T (N ) = 2 T (N /2) + O (N ).

    15

    Algorithm 3: Analysis ConclusionWe wont bother with formally solving these equations now, but will revisit them later in the course.

    At this point, however, we simply note that the final result is

    T (N ) = O (N log N )

    Again, recall that when an algorithm works by dividing the work to be done by a constant factor in each iteration (or invocation), the running time will likely include a factor that is logarithmic in the problem size (N ).

    16

    Algorithm 4: Further Improvements

    Our final algorithm for the maximum subsequence sum problem is not only the simplest, but also the most efficient (in terms of running time growth).Eliminating the need for the i loop can be understood if we make the following observations.

    If a[i] is negative, then any sequence that begins with it can be improved by starting with the next element.

    Any subsequence that begins with a negative subsequence can be improved by eliminating that negative subsequence.

    17

    Algorithm 4: A Single Loop

    To eliminate the i loop, we just keep track of the subscript of the last element of the subsequence being examined.As soon as the sum of a subsequence becomes negative, we just set the sum back to zero, essentially eliminating it from the current subsequences sum.

    18

    Algorithm 4: C++ Code

    int maxSubSum4 (const vector &a){

    int maxSum=0, thisSum = 0;

    for (int j=0; j maxSum)

    maxSum = thisSum;else if (thisSum < 0) // eliminate negative prefix

    thisSum = 0;}return maxSum;

    }

  • 419

    Algorithm 4: AnalysisThe running time for algorithm 4 should now be easy for you to determine.

    The body of the for loop has running time O (1), since it contains

    one assignment statement, with an addition, clearly requiring time O (1), andone if statement, with the then and else parts containing one assignment statement each, with running time O (1).

    Since the number of times the body of the for loop is executed is equal to the problem size, we clearly have

    T (N ) = O (N )

    20

    Algorithm 4: An On-line Algorithm

    This last algorithm has the property of being an on-line algorithm.This means that the algorithm

    requires only constant space (since only three integers maxSum, thisSum, and the current value from the a vector are needed at any time), and

    can instantly provide an answer to the problem for the data it has already processed.

    21

    Binary SearchGiven an integer X and integers A 0 , A 1 , , A N-1 , which are presorted and already in memory, find i such that A i = X, or return i = -1 if X is not in the input.The most obvious solution is a linear search, examining A 0, then A 1, and so forth. It should be easy to see that this solution has T (N ) = O (N ).The linear search does not use the fact that the data is presorted.The binary search does better by examining the middle element which is either the desired value X, or identifies which of the remaining parts (left or right of the middle element) should be examined further.

    22

    Binary Search: C++ Code

    int binarySearch(const vector &a, const int x) {int low = 0, high = a.size()-1;while (low

  • 525

    Euclids Algorithm: C++ Code

    long gcd(long a, long b) {while (b != 0) { // #1

    long rem = a % b; // #2a = b; // #3 b = rem; // #3

    }return a; // #4

    }

    26

    Euclids Algorithm: AnalysisEach execution of the body of the loop started with statement 1 (that is, statements 2, 3, and 4) takes constant time, so the running time of the algorithm depends only on the number of iterations of the loop, which depends on the length of the sequence of nonzero remainders.If we could show that each iteration of the loop decreased the value of the remainder by at least a constant factor, then we could predict a logarithmic running time.But this is not the case (refer back to the computation of the gcd of 137,912 and 151,360).We can, however, show that after two iterations the remainder is at most half its original value (the proof appears on the next slide).Thus, since 2 log N = O (log N), we have established the logarithmic running time of the algorithm.

    27

    Remainder Reduction Rate AnalysisTo show that the remainder decreases by at least half with every pair of iterations, it is sufficient to prove that if a > b, then a mod b < a / 2. (Recall that the remainder of a / b, or a mod b, will always be less than b.)There are two cases to consider:

    If b ? a / 2, the remainder is clearly less than a / 2.If b > a / 2, then a / b = 0 with a remainder of a b, which must be less than a / 2.

    Thus, even if a < b, at most two iterations of the loop will be required to obtain a remainder that is at most half as large as a.

    28

    Exponentiation

    Computation of a b (where a and b are both integers) can be done using the obvious technique involving b 1 multiplications.We can do better than this is we observe that if z= a k, then z ? z = a 2 k.Thus, if b is even, a b = (a 2) b / 2, eliminating almost half of the multiplications required by the obvious technique.

    29

    Exponentiation: C++ Code

    long pow(long a, long b) {if (b == 0) return 1; // #1if (b == 1) return a; // #2if (isEven(n)) // #3

    return pow (a * a, b / 2); // #4else return pow (a * a, b / 2) * a; // #5

    }

    30

    Exponentiation: Analysis

    The base cases for the recursive function (statements 1 and 2) clearly take O (1) time.Each recursive invocation of pow reduces the size of the exponent by a constant factor (2), so the running time is logarithmic.If we count the number of multiplications, its easy to see that at most two multiplications are involved for each invocation of the function, so the maximum number of multiplications is 2 log2 b.