16. Algo analysis & Design - Data Structures using C++ by Varsha Patil

1Oxford University Press © 2012

Data Structures Using C++ by Dr Varsha Patil

16. Algorithm Analysis

And Design



Objectives After completing this chapter, the reader will

be able to understand the following:

Basic tools needed to develop and analyze algorithms

Methods to compute the efficiency of algorithms

Ways to make a wise choice among many solutions for a given problem



IntroductionAlgorithm AnalysisAsymptoti c Notations (W, p, O)Big Omega (W)



Graph Abstract Data Type

The following steps elaborate the general structure of the divide-and-conquer strategy

If the data size n of problem P is fundamental, calculate the result of P(n) and go to step 4

If the data size n of problem P is not fundamental, divide the problem P(n) into equivalent subproblems P(n1), P(n2), … P(ni) such that i ≥ 1

Apply divide-and-conquer recursively to each individual subproblem P(n1), P(n2), …,P(ni)

Combine the results of all subproblems P(n1), P(n2),…, P(ni) to get the final solution of P(n)



Analysis of Quick sort At level one, only one call to a partition is

made with n elements; at level two, Atmost two calls are made with elements (n - 1), and so on

C(n) = O(n2)



DIVIDE-A ND-CONQUER General Knapsack Problem Elements of Greedy Strategy



Elements of Greedy Strategy

To decide whether a problem can be solved using a greedy strategy, the following elements should be considered:

Greedy-choice property Optimal substructure Greedy Method



Greedy-choice property A problem exhibits greedy-choice property if a globally optimal solution can be arrived at by making a locally optimal greedy choice

That is, we make the choice that seems best at that time without considering the results from the sub problems



Dynamic Programming The General Method Elements of Dynamic Programming Principle of Optimality Limitations of Dynamic Programming Knapsack Problem



Elements of Dynamic Programming

A dynamic programming solution has the following three components:

Formulate the answer as a recurrence relation or a recursive algorithm

Show that the number of different instances of your recurrence is bounded by a polynomial

Specify an order of evaluation for the recurrence



Elements of Dynamic Programming

To decide whether a problem can be solved using the dynamic programming method, the following three elements of dynamic programming should be considered:

Optimal substructure Overlapping subproblems Memorization



Optimal Substructure A problem exhibits optimal substructure if an

optimal solution to the problem contains within it optimal solutions to subproblems

It also means that dynamic programming (and greedy method) might apply



Overlapping subproblems(CONTD…)

When a recursive algorithm revisits the same problem repeatedly, it is said that the optimization problem has overlapping sub problems

This is beneficial for dynamic programming

It solves each subproblem once and stores the answer in a table

This answer can be searched in constant time when required. This is contradictory to the divide-and-conquer strategy where a new problem is generated at each step of recursion



Overlapping subproblems(CONTD…)

A problem exhibits optimal substructure if an optimal solution to the problem contains within it optimal solutions to subproblems

It also means that dynamic programming (and greedy method) might apply



Memorization However, it uses the control structure similar to

the recursive algorithm In a memorized recursive algorithm, an entry is

maintained in a table for the solution to each subproblem

Initially, all entries contain a special value, which indicates that the entry is not yet used

For each subproblem, which is encountered for the first time, its solution is computed and stored in the table

Next time, for that subproblem, its entry is searched and the value is used

This can be implemented using hashing



The principle of optimality states that an optimal sequence of decisions has the property that whatever the initial state and decision are, the remaining decisions must constitute an optimal decision sequence with regard to the state resulting from the first decision

Principle of Optimality



Pattern matching is the process of finding the presence of a particular string (pattern) in the given string (text)

Pattern Matching



Database search Search engine Text editors Intrusion detection Natural language processing Feature detection in digitized images

A few such applications are as follows:



The most popular are the following:

Brute-force approachBoyer–Moore algorithmKnuth–Morris–Pratt algorithmRobin–Karp algorithmText partitioning algorithmSemi-numerical algorithm

Popular techniques forstring pattern search



String : A string is a finite sequence of symbols that are chosen from a set or alphabet:

Alphabet is a set of characters or symbols

Substring: A substring or subsequence of a string is a subset of the symbols in a string where the order of elements is preserved

Suffix: A suffix of S is a substring S[i, …, m − 1], where i ranges between 0 and m − 1



Brute-force Approach

This is a simple straight forward approach based on the comparison of a pattern character by character with a string



The steps involved in this approach are as follows:

Adjust the pattern P at beginning of the text

Start moving from left to right and compare the character of pattern to the corresponding character in text

Continue with step 2 until successful (all characters of the pattern are matched) or unsuccessful (a mismatch is detected)



Boyer–Moore Algorithm

Boyer and Moore have developed an efficient pattern matching algorithm

Instead of sliding by one character to the right at a time, in Boyer–Moore approach, the sliding to the right is done in longer steps

The algorithm scans the character of pattern from right to left beginning with the rightmost character

If the text symbol compared with the rightmost pattern symbol does not occur in the pattern at all, then the pattern can be shifted by m positions (where m is length of pattern)



Knuth–Morris–Pratt Algorithm

The researchers Knuth, Morris, and Pattern proposed a linear time algorithm for the string matching problem

In this approach, a matching time of O(n) is achieved by avoiding comparisons with characters of T that have previously been involved in comparison with some element

of the pattern P to be matched so that backtracking is avoided



KMP Matcher The KMP matcher finds the occurrence of

the pattern P in text T and returns the number of shifts of P, after which the occurrence is found taking T, P, and prefix function p as inputs



TRIES A compact data structure that represents a

set of strings (such as all the words in a text) known as tries

A trie is a tree-based data structure for storing strings to make pattern matching faster

A trie helps in pattern matching in time that is proportional to the length of the pattern

Tries can be used to perform prefix query for information retrieval

Prefix query searches for the longest prefix of a given string that matches a prefix of some string in the tries



There are variants of tries, which are listed as follows:

Standard tries Compressed trie Suffix tries



The standard trie for a set of strings S is an ordered tree such that

Each node but the root is labelled with a character;

The children of a node are alphabetically ordered;

The paths from the external nodes to the oot yield the strings of S



Compressed Tries Similar to the standard trie, a compressed is

a tree-based data structure For storing strings in order to make pattern

matching much faster This is an optimized approach for pattern

matching specially suitable for applications where time is a more crucial factor



Following are the unique characteristics of compressed tire:

A compressed trie (or Patricia trie) has internal nodes of degree at least 2

It is obtained from standard trie by compressing chains of redundant nodes



Compressed Trie

b

id uo

ok

il sh y

s

ell

to

ck p



Suffix Triesm e g h a v i n d 0 1 2 3 4 5 6 7 8

d e me ghavindnd

ndghavindnd

ghavindvind



Suffix Tries A suffix trie is a compressed Trie for all

the suffixes of a text This is a compressed Trie, and hence,

possesses all features a compressed trie and makes it more powerful for making a search faster as it includes all suffixes of a text



End of Chapter 16…!

16. Algo analysis & Design - Data Structures using C++ by Varsha Patil

Data & Analytics

Transcript of 16. Algo analysis & Design - Data Structures using C++ by Varsha Patil