A Practical Introduction to Data Structures and Algorithm Analysis

download A Practical Introduction to Data Structures and Algorithm Analysis

of 346

Transcript of A Practical Introduction to Data Structures and Algorithm Analysis

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    1/346

    Coursenotes

    A Practical Introduction to

    Data Structures and Algorithm Analysis

    Second Edition

    Clifford A. Shaffer

    Department of Computer Science

    Virginia Tech

    Copyright 2000, 2001

    Last Updated: 01/10/2003

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    2/346

    The Need for Data Structures

    Data structures organize data

    more efficient programs.

    More powerful computers more complexapplications.

    More complex applications demand more

    calculations.Complex computing tasks are unlike our

    everyday experience.

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    3/346

    Organizing Data

    Any organization for a collection of records

    can be searched, processed in any order,or modified.

    The choice of data structure and algorithm

    can make the difference between aprogram running in a few seconds or many

    days.

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    4/346

    Efficiency

    A solution is said to be efficient if it solves

    the problem within its resource constraints.

    Space

    Time

    The cost of a solution is the amount ofresources that the solution consumes.

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    5/346

    Selecting a Data Structure

    Select a data structure as follows:

    1. Analyze the problem to determine theresource constraints a solution must

    meet.2. Determine the basic operations that must

    be supported. Quantify the resource

    constraints for each operation.3. Select the data structure that best meets

    these requirements.

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    6/346

    Some Questions to Ask

    Are all data inserted into the data structure

    at the beginning, or are insertionsinterspersed with other operations?

    Can data be deleted?

    Are all data processed in some well-defined order, or is random accessallowed?

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    7/346

    Data Structure Philosophy

    Each data structure has costs and benefits.

    Rarely is one data structure better thananother in all situations.

    A data structure requires:

    space for each data item it stores,

    time to perform each basic operation,

    programming effort.

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    8/346

    Data Structure Philosophy (cont)

    Each problem has constraints on availablespace and time.

    Only after a careful analysis of problem

    characteristics can we know the best datastructure for the task.

    Bank example:

    Start account: a few minutes Transactions: a few seconds Close account: overnight

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    9/346

    Goals of this Course

    1. Reinforce the concept that costs andbenefits exist for every data structure.

    2. Learn the commonly used data

    structures. These form a programmer's basic data

    structure ``toolkit.'

    3. Understand how to measure the cost of adata structure or program.

    These techniques also allow you to judge themerits of new data structures that you orothers might invent.

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    10/346

    Abstract Data Types

    Abstract Data Type (ADT): a definition for adata type solely in terms of a set of valuesand a set of operations on that data type.

    Each ADT operation is defined by its inputsand outputs.

    Encapsulation: Hide implementation details.

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    11/346

    Data Structure

    A data structure is the physicalimplementation of an ADT. Each operation associated with the ADT is

    implemented by one or more subroutines inthe implementation.

    Data structure usually refers to anorganization for data in main memory.

    File structure is an organization for data onperipheral storage, such as a disk drive.

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    12/346

    Metaphors

    An ADT manages complexity throughabstraction: metaphor. Hierarchies of labels

    Ex: transistors gates CPU.

    In a program, implement an ADT, then think

    only about the ADT, not itsimplementation.

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    13/346

    Logical vs. Physical Form

    Data items have both a logical and aphysical form.

    Logical form: definition of the data itemwithin an ADT. Ex: Integers in mathematical sense: +, -

    Physical form: implementation of the dataitem within a data structure. Ex: 16/32 bit integers, overflow.

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    14/346

    Data Type

    ADT:Type

    Operations

    Data Items:Logical Form

    Data Items:Physical Form

    Data Structure:

    Storage SpaceSubroutines

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    15/346

    Problems

    Problem: a task to be performed.

    Best thought of as inputs and matchingoutputs.

    Problem definition should include constraintson the resources that may be consumed byany acceptable solution.

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    16/346

    Problems (cont)

    Problems mathematical functions A function is a matching between inputs (the

    domain) and outputs (the range).

    An input to a function may be single number,or a collection of information.

    The values making up an input are called theparameters of the function.

    A particular input must always result in thesame output every time the function iscomputed.

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    17/346

    Algorithms and ProgramsAlgorithm: a method or a process followed to

    solve a problem. A recipe.

    An algorithm takes the input to a problem(function) and transforms it to the output. A mapping of input to output.

    A problem can have many algorithms.

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    18/346

    Algorithm Properties

    An algorithm possesses the followingproperties: It must be correct. It must be composed of a series of concrete steps.

    There can be no ambiguity as to which step will beperformed next.

    It must be composed of a finite number of steps. It must terminate.

    A computer program is an instance, orconcrete representation, for an algorithmin some programming language.

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    19/346

    Mathematical Background

    Set concepts and notation.

    Recursion

    Induction Proofs

    Logarithms

    Summations

    Recurrence Relations

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    20/346

    Estimation Techniques

    Known as back of the envelope orback of the napkin calculation

    1. Determine the major parameters that effect the

    problem.

    2. Derive an equation that relates the parametersto the problem.

    3. Select values for the parameters, and applythe equation to yield and estimated solution.

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    21/346

    Estimation Example

    How many library bookcases does ittake to store books totaling onemillion pages?

    Estimate: Pages/inch

    Feet/shelf Shelves/bookcase

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    22/346

    Algorithm Efficiency

    There are often many approaches(algorithms) to solve a problem. How dowe choose between them?

    At the heart of computer program design aretwo (sometimes conflicting) goals.

    1. To design an algorithm that is easy tounderstand, code, debug.

    2. To design an algorithm that makes efficientuse of the computers resources.

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    23/346

    Algorithm Efficiency (cont)

    Goal (1) is the concern of SoftwareEngineering.

    Goal (2) is the concern of data structuresand algorithm analysis.

    When goal (2) is important, how do we

    measure an algorithms cost?

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    24/346

    How to Measure Efficiency?

    1. Empirical comparison (run programs)2. Asymptotic Algorithm Analysis

    Critical resources:

    Factors affecting running time:

    For most algorithms, running time dependson size of the input.

    Running time is expressed as T(n) for somefunction Ton input size n.

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    25/346

    Examples of Growth Rate

    Example 1.

    // Find largest valueint largest(int array[], int n) {int currlarge = 0; // Largest value seenfor (int i=1; i

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    26/346

    Examples (cont)

    Example 2: Assignment statement.

    Example 3:

    sum = 0;for (i=1; i

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    27/346

    Growth Rate Graph

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    28/346

    Best, Worst, Average Cases

    Not all inputs of a given size take the sametime to run.

    Sequential search for Kin an array of n

    integers: Begin at first element in array and look ateach element in turn until Kis found

    Best case:Worst case:

    Average case:

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    29/346

    Which Analysis to Use?

    While average time appears to be the fairestmeasure, it may be diffiuclt to determine.

    When is the worst case time important?

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    30/346

    Faster Computer or Algorithm?

    What happens when we buy a computer 10times faster?

    T

    (n

    )n n

    Changen/n

    10n 1,000 10,000 n = 10n 10

    20n 500 5,000 n = 10n 10

    5nlog n 250 1,842 10 n< n < 10n 7.37

    2n2 70 223 n = 10n 3.16

    2n 13 16 n = n+ 3 -----

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    31/346

    Asymptotic Analysis: Big-oh

    Definition: For T(n) a non-negatively valuedfunction, T(n) is in the set O(f(n)) if thereexist two positive constants cand n0

    such that T(n) n0.Usage: The algorithm is in O(n2) in [best, average,

    worst] case.

    Meaning: For all data sets big enough (i.e., n>n0),the algorithm always executes in less thancf(n) steps in [best, average, worst] case.

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    32/346

    Big-oh Notation (cont)

    Big-oh notation indicates an upper bound.

    Example: If T(n) = 3n2then T(n) is in O(n2).

    Wish tightest upper bound:

    While T(n) = 3n2is in O(n3), we prefer O(n2).

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    33/346

    Big-Oh Examples

    Example 1: Finding valueXin an array(average cost).

    T(n) = csn/2.For all values of n> 1, csn/2

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    34/346

    Big-Oh Examples

    Example 2: T(n) = c1n2+ c2nin average

    case.

    c1n2+ c2n

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    35/346

    A Common Misunderstanding

    The best case for my algorithm is n=1because that is the fastest. WRONG!

    Big-oh refers to a growth rate as ngrows to.

    Best case is defined as which input of size n

    is cheapest among all inputs of size n.

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    36/346

    Big-Omega

    Definition: For T(n) a non-negatively valuedfunction, T(n) is in the set (g(n)) if thereexist two positive constants cand n0

    such that T(n) >= cg(n) for all n> n0.

    Meaning: For all data sets big enough (i.e.,n> n0), the algorithm always executes in

    more than cg(n) steps.

    Lower bound.

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    37/346

    Big-Omega Example

    T(n) = c1n2+ c2n.

    c1n2+ c2n>= c1n

    2for all n> 1.

    T(n) >= cn2for c= c1and n0= 1.

    Therefore, T(n) is in (n2) by the definition.

    We want the greatest lower bound.

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    38/346

    Theta Notation

    When big-Oh and meet, we indicate thisby using (big-Theta) notation.

    Definition: An algorithm is said to be (h(n))if it is in O(h(n)) and it is in (h(n)).

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    39/346

    A Common Misunderstanding

    Confusing worst case with upper bound.

    Upper bound refers to a growth rate.

    Worst case refers to the worst input fromamong the choices for possible inputs of

    a given size.

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    40/346

    Simplifying Rules

    1. If f(n) is in O(g(n)) andg(n) is in O(h(n)),then f(n) is in O(h(n)).

    2. If f(n) is in O(kg(n)) for any constant k>

    0, then f(n) is in O(g(n)).3. If f1(n) is in O(g1(n)) and f2(n) is in

    O(g2(n)), then (f1+ f2)(n) is in

    O(max(g1(n),g2(n))).4. If f1(n) is in O(g1(n)) and f2(n) is inO(g2(n)) then f1(n)f2(n) is in O(g1(n)g2(n)).

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    41/346

    Running Time Examples (1)

    Example 1: a = b;

    This assignment takes constant time, so it is

    (1).

    Example 2:sum = 0;

    for (i=1; i

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    42/346

    Running Time Examples (2)

    Example 3:sum = 0;for (i=1; i

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    43/346

    Running Time Examples (3)

    Example 4:sum1 = 0;for (i=1; i

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    44/346

    Running Time Examples (4)

    Example 5:sum1 = 0;for (k=1; k

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    45/346

    Binary Search

    How many elements are examined in worstcase?

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    46/346

    Binary Search

    // Return position of element in sorted// array of size n with value K.int binary(int array[], int n, int K) {int l = -1;int r = n; // l, r are beyond array bounds

    while (l+1 != r) { // Stop when l, r meetint i = (l+r)/2; // Check middleif (K < array[i]) r = i; // Left halfif (K == array[i]) return i; // Found itif (K > array[i]) l = i; // Right half

    }

    return n; // Search value not in array}

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    47/346

    Other Control Statements

    whileloop: Analyze like a forloop.

    ifstatement: Take greater complexity ofthen/elseclauses.

    switchstatement: Take complexity of mostexpensive case.

    Subroutine call: Complexity of thesubroutine.

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    48/346

    Analyzing Problems

    Upper bound: Upper bound of best knownalgorithm.

    Lower bound: Lower bound for everypossible algorithm.

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    49/346

    Analyzing Problems: Example

    Common misunderstanding: No distinctionbetween upper/lower bound when you knowthe exact running time.

    Example of imperfect knowledge: Sorting

    1. Cost of I/O: (n).2. Bubble or insertion sort: O(n2).

    3. A better sort (Quicksort, Mergesort,Heapsort, etc.): O(nlog n).

    4. We prove later that sorting is (nlog n).

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    50/346

    Multiple Parameters

    Compute the rank ordering for all Cpixelvalues in a picture of Ppixels.

    for (i=0; i

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    51/346

    Space Bounds

    Space bounds can also be analyzed withasymptotic complexity analysis.

    Time: AlgorithmSpace Data Structure

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    52/346

    Space/Time Tradeoff Principle

    One can often reduce time if one is willing tosacrifice space, or vice versa.

    Encoding or packing informationBoolean flags

    Table lookupFactorials

    Disk-based Space/Time Tradeoff Principle:The smaller you make the disk storagerequirements, the faster your programwill run.

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    53/346

    Lists

    A list is a finite, ordered sequence of dataitems.

    Important concept: List elements have aposition.

    Notation:

    What operations should we implement?

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    54/346

    List Implementation Concepts

    Our list implementation will support theconcept of a current position.

    We will do this by defining the list in terms ofleft and right partitions. Either or both partitions may be empty.

    Partitions are separated by the fence.

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    55/346

    List ADT

    template class List {

    public:

    virtual void clear() = 0;

    virtual bool insert(const Elem&) = 0;

    virtual bool append(const Elem&) = 0;

    virtual bool remove(Elem&) = 0;

    virtual void setStart() = 0;

    virtual void setEnd() = 0;

    virtual void prev() = 0;

    virtual void next() = 0;

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    56/346

    List ADT (cont)

    virtual int leftLength() const = 0;

    virtual int rightLength() const = 0;

    virtual bool setPos(int pos) = 0;

    virtual bool getValue(Elem&) const = 0;

    virtual void print() const = 0;

    };

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    57/346

    List ADT Examples

    List:

    MyList.insert(99);

    Result:

    Iterate through the whole list:

    for (MyList.setStart(); MyList.getValue(it);MyList.next())

    DoSomething(it);

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    58/346

    List Find Function

    // Return true iff K is in list

    bool find(List& L, int K) {

    int it;

    for (L.setStart(); L.getValue(it);

    L.next())if (K == it) return true; // Found it

    return false; // Not found

    }

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    59/346

    Array-Based List Insert

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    60/346

    Array-Based List Class (1)

    template // Array-based listclass AList : public List {private:int maxSize; // Maximum size of listint listSize; // Actual elem count

    int fence; // Position of fenceElem* listArray; // Array holding list

    public:

    AList(int size=DefaultListSize) {

    maxSize = size;listSize = fence = 0;

    listArray = new Elem[maxSize];

    }

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    61/346

    Array-Based List Class (2)

    ~AList() { delete [] listArray; }void clear() {delete [] listArray;listSize = fence = 0;listArray = new Elem[maxSize];

    }void setStart() { fence = 0; }void setEnd() { fence = listSize; }void prev() { if (fence != 0) fence--; }void next() { if (fence

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    62/346

    Array-Based List Class (3)

    bool setPos(int pos) {if ((pos >= 0) && (pos = 0) && (pos

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    63/346

    Insert

    // Insert at front of right partitiontemplate bool AList::insert(const Elem& item) {if (listSize == maxSize) return false;for(int i=listSize; i>fence; i--)

    // Shift Elems up to make roomlistArray[i] = listArray[i-1];

    listArray[fence] = item;listSize++; // Increment list sizereturn true;

    }

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    64/346

    Append

    // Append Elem to end of the listtemplate bool AList::append(const Elem& item) {if (listSize == maxSize) return false;listArray[listSize++] = item;

    return true;}

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    65/346

    Remove

    // Remove and return first Elem in right// partitiontemplate boolAList::remove(Elem& it) {if (rightLength() == 0) return false;

    it = listArray[fence]; // Copy Elemfor(int i=fence; i

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    66/346

    Link Class

    Dynamic allocation of new list elements.

    // Singly-linked list nodetemplate class Link {

    public:Elem element; // Value for this nodeLink *next; // Pointer to next nodeLink(const Elem& elemval,

    Link* nextval =NULL)

    { element = elemval; next = nextval; }Link(Link* nextval =NULL){ next = nextval; }

    };

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    67/346

    Linked List Position (1)

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    68/346

    Linked List Position (2)

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    69/346

    Linked List Class (1)

    / Linked list implementationtemplate class LList:

    public List {private:Link* head; // Point to list header

    Link* tail; // Pointer to last ElemLink* fence;// Last element on leftint leftcnt; // Size of leftint rightcnt; // Size of rightvoid init() { // Intialization routine

    fence = tail = head = new Link;leftcnt = rightcnt = 0;

    }

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    70/346

    Linked List Class (2)

    void removeall() { // Return link nodes tofree storewhile(head != NULL) {fence = head;head = head->next;

    delete fence;}

    }public:LList(int size=DefaultListSize)

    { init(); }~LList() { removeall(); } // Destructorvoid clear() { removeall(); init(); }

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    71/346

    Linked List Class (3)

    void setStart() {fence = head; rightcnt += leftcnt;leftcnt = 0; }

    void setEnd() {fence = tail; leftcnt += rightcnt;rightcnt = 0; }

    void next() {// Don't move fence if right emptyif (fence != tail) {fence = fence->next; rightcnt--;

    leftcnt++; }}int leftLength() const { return leftcnt; }int rightLength() const { return rightcnt; }bool getValue(Elem& it) const {if(rightLength() == 0) return false;it = fence->next->element;return true; }

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    72/346

    Insertion

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    73/346

    Insert/Append

    // Insert at front of right partitiontemplate bool LList::insert(const Elem& item) {fence->next =new Link(item, fence->next);

    if (tail == fence) tail = fence->next;rightcnt++;return true;}

    // Append Elem to end of the listtemplate

    bool LList::append(const Elem& item) {tail = tail->next =new Link(item, NULL);

    rightcnt++;return true;}

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    74/346

    Removal

    R

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    75/346

    Remove

    // Remove and return first Elem in right// partitiontemplate boolLList::remove(Elem& it) {if (fence->next == NULL) return false;it = fence->next->element; // Remember val

    // Remember link nodeLink* ltemp = fence->next;fence->next = ltemp->next; // Removeif (tail == ltemp) // Reset tailtail = fence;

    delete ltemp; // Reclaim spacerightcnt--;return true;

    }

    P

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    76/346

    Prev

    // Move fence one step left;// no change if left is emptytemplate voidLList::prev() {Link* temp = head;if (fence == head) return; // No prev Elemwhile (temp->next!=fence)temp=temp->next;

    fence = temp;leftcnt--;rightcnt++;

    }

    S

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    77/346

    Setpos

    // Set the size of left partition to postemplate bool LList::setPos(int pos) {if ((pos < 0) || (pos > rightcnt+leftcnt))return false;

    fence = head;for(int i=0; inext;

    return true;}

    C i f I l i

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    78/346

    Comparison of Implementations

    Array-Based Lists: Insertion and deletion are (n). Prev and direct access are (1). Array must be allocated in advance.

    No overhead if all array positions are full.

    Linked Lists: Insertion and deletion are (1).

    Prev and direct access are (n). Space grows with number of elements. Every element requires overhead.

    S C i

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    79/346

    Space Comparison

    Break-even point:

    DE= n(P+ E);

    n= DEP+ E

    E: Space for data value.P: Space for pointer.D: Number of elements in array.

    F li t

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    80/346

    Freelists

    System newand deleteare slow.// Singly-linked list node with freelisttemplate class Link {private:static Link* freelist; // Head

    public:Elem element; // Value for this nodeLink* next; // Point to next nodeLink(const Elem& elemval,

    Link* nextval =NULL)

    { element = elemval; next = nextval; }Link(Link* nextval =NULL) {next=nextval;}void* operator new(size_t); // Overloadvoid operator delete(void*); // Overload

    };

    F li t (2)

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    81/346

    Freelists (2)

    template Link* Link::freelist = NULL;

    template // Overload for newvoid* Link::operator new(size_t) {if (freelist == NULL) return ::new Link;

    Link* temp = freelist; // Reusefreelist = freelist->next;return temp; // Return the link

    }

    template // Overload deletevoid Link::operator delete(void* ptr){((Link*)ptr)->next = freelist;freelist = (Link*)ptr;

    }

    D bl Li k d Li t

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    82/346

    Doubly Linked Lists

    Simplify insertion and deletion: Add a prevpointer.

    // Doubly-linked list link nodetemplate class Link {

    public:Elem element; // Value for this nodeLink *next; // Pointer to next nodeLink *prev; // Pointer to previous nodeLink(const Elem& e, Link* prevp =NULL,

    Link* nextp =NULL){ element=e; prev=prevp; next=nextp; }Link(Link* prevp =NULL, Link* nextp =NULL){ prev = prevp; next = nextp; }

    };

    D bl Li k d Li t

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    83/346

    Doubly Linked Lists

    D bl Li k d I t

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    84/346

    Doubly Linked Insert

    D bl Li k d I t

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    85/346

    Doubly Linked Insert

    // Insert at front of right partitiontemplate bool LList::insert(const Elem& item) {fence->next =new Link(item, fence, fence->next);

    if (fence->next->next != NULL)fence->next->next->prev = fence->next;

    if (tail == fence) // Appending new Elemtail = fence->next; // so set tail

    rightcnt++; // Added to right

    return true;}

    D bl Li k d R

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    86/346

    Doubly Linked Remove

    Do bl Linked Remo e

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    87/346

    Doubly Linked Remove

    // Remove, return first Elem in right parttemplate bool LList::remove(Elem& it) {if (fence->next == NULL) return false;it = fence->next->element;

    Link* ltemp = fence->next;if (ltemp->next != NULL)ltemp->next->prev = fence;

    else tail = fence; // Reset tailfence->next = ltemp->next; // Remove

    delete ltemp; // Reclaim spacerightcnt--; // Removed from rightreturn true;

    }

    Dictionary

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    88/346

    Dictionary

    Often want to insert records, delete records,search for records.

    Required concepts:

    Search key: Describe what we are lookingfor

    Key comparison Equality: sequential search

    Relative order: sorting Record comparison

    Comparator Class

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    89/346

    Comparator Class

    How do we generalize comparison? Use ==, =: Disastrous Overload ==, =: Disastrous Define a function with a standard name

    Implied obligation Breaks down with multiple key fields/indices

    for same object Pass in a function

    Explicit obligation Function parameter Template parameter

    Comparator Example

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    90/346

    Comparator Example

    class intintCompare {public:static bool lt(int x, int y){ return x < y; }

    static bool eq(int x, int y)

    { return x == y; }static bool gt(int x, int y){ return x > y; }

    };

    Comparator Example (2)

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    91/346

    Comparator Example (2)

    class PayRoll {public:int ID;char* name;

    };

    class IDCompare {public:static bool lt(Payroll& x, Payroll& y){ return x.ID < y.ID; }

    };

    class NameCompare {public:static bool lt(Payroll& x, Payroll& y){ return strcmp(x.name, y.name) < 0; }

    };

    Dictionary ADT

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    92/346

    Dictionary ADT

    // The Dictionary abstract class.template class Dictionary {public:

    virtual void clear() = 0;virtual bool insert(const Elem&) = 0;virtual bool remove(const Key&, Elem&) = 0;virtual bool removeAny(Elem&) = 0;virtual bool find(const Key&, Elem&)

    const = 0;virtual int size() = 0;};

    Unsorted List Dictionary

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    93/346

    Unsorted List Dictionary

    template class UALdict : public

    Dictionary {private: AList* list;public:bool remove(const Key& K, Elem& e) {

    for(list->setStart(); list->getValue(e);list->next())

    if (KEComp::eq(K, e)) {list->remove(e);

    return true;}return false;

    }};

    Stacks

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    94/346

    Stacks

    LIFO: Last In, First Out.

    Restricted form of list: Insert and removeonly at front of list.

    Notation: Insert: PUSH Remove: POP The accessible element is called TOP.

    Stack ADT

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    95/346

    Stack ADT

    // Stack abtract classtemplate class Stack {public:// Reinitialize the stackvirtual void clear() = 0;// Push an element onto the top of the stack.

    virtual bool push(const Elem&) = 0;// Remove the element at the top of the stack.virtual bool pop(Elem&) = 0;// Get a copy of the top element in the stackvirtual bool topValue(Elem&) const = 0;// Return the number of elements in the stack.

    virtual int length() const = 0;};

    Array Based Stack

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    96/346

    Array-Based Stack

    // Array-based stack implementationprivate:int size; // Maximum size of stackint top; // Index for top elementElem *listArray; // Array holding elements

    Issues: Which end is the top? Where does top point to?

    What is the cost of the operations?

    Linked Stack

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    97/346

    Linked Stack

    // Linked stack implementationprivate:Link* top; // Pointer to first elemint size; // Count number of elems

    What is the cost of the operations?

    How do space requirements compare to thearray-based stack implementation?

    Queues

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    98/346

    Queues

    FIFO: First in, First Out

    Restricted form of list: Insert at one end,remove from the other.

    Notation: Insert: Enqueue Delete: Dequeue

    First element: Front Last element: Rear

    Queue Implementation (1)

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    99/346

    Queue Implementation (1)

    Queue Implementation (2)

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    100/346

    Queue Implementation (2)

    Binary Trees

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    101/346

    Binary Trees

    A binary tree is made up of a finite set ofnodes that is either empty or consists of anode called the root together with twobinary trees, called the left and rightsubtrees, which are disjoint from eachother and from the root.

    Binary Tree Example

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    102/346

    Binary Tree Example

    Notation: Node,children, edge,parent, ancestor,descendant, path,depth, height, level,leaf node, internalnode, subtree.

    Full and Complete Binary Trees

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    103/346

    Full and Complete Binary Trees

    Full binary tree: Each node is either a leaf orinternal node with exactly two non-empty children.

    Complete binary tree: If the height of the tree is d,

    then all leaves except possibly level darecompletely full. The bottom level has all nodes tothe left side.

    Full Binary Tree Theorem (1)

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    104/346

    Full Binary Tree Theorem (1)

    Theorem: The number of leaves in a non-emptyfull binary tree is one more than the number ofinternal nodes.

    Proof(by Mathematical Induction):

    Base case: A full binary tree with 1 internal node musthave two leaf nodes.

    Induction Hypothesis: Assume any full binary tree Tcontaining n-1 internal nodes has nleaves.

    Full Binary Tree Theorem (2)

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    105/346

    Full Binary Tree Theorem (2)

    Induction Step: Given tree Twith n internalnodes, pick internal node Iwith two leaf children.Remove Is children, call resulting tree T.

    By induction hypothesis,T

    is a full binary tree withn leaves.

    Restore Is two children. The number of internalnodes has now gone up by 1 to reach n. The

    number of leaves has also gone up by 1.

    Full Binary Tree Corollary

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    106/346

    Full Binary Tree Corollary

    Theorem: The number of null pointers in anon-empty tree is one more than thenumber of nodes in the tree.

    Proof: Replace all null pointers with apointer to an empty leaf node. This is afull binary tree.

    Binary Tree Node Class (1)

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    107/346

    Binary Tree Node Class (1)

    // Binary tree node classtemplate class BinNodePtr : public BinNode {private:Elem it; // The node's valueBinNodePtr* lc; // Pointer to left childBinNodePtr* rc; // Pointer to right child

    public:BinNodePtr() { lc = rc = NULL; }BinNodePtr(Elem e, BinNodePtr* l =NULL,

    BinNodePtr* r =NULL)

    { it = e; lc = l; rc = r; }

    Binary Tree Node Class (2)

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    108/346

    Binary Tree Node Class (2)

    Elem& val() { return it; }void setVal(const Elem& e) { it = e; }inline BinNode* left() const{ return lc; }

    void setLeft(BinNode* b){ lc = (BinNodePtr*)b; }

    inline BinNode* right() const{ return rc; }

    void setRight(BinNode* b){ rc = (BinNodePtr*)b; }

    bool isLeaf()

    { return (lc == NULL) && (rc == NULL); }};

    Traversals (1)

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    109/346

    Traversals (1)

    Any process for visiting the nodes insome order is called a traversal.

    Any traversal that lists every node inthe tree exactly once is called anenumeration of the trees nodes.

    Traversals (2)

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    110/346

    Traversals (2)

    Preorder traversal: Visit each node beforevisiting its children.

    Postorder traversal: Visit each node after

    visiting its children. Inorder traversal: Visit the left subtree,

    then the node, then the right subtree.

    Traversals (3)

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    111/346

    Traversals (3)

    template // Good implementationvoid preorder(BinNode* subroot) {if (subroot == NULL) return; // Emptyvisit(subroot); // Perform some actionpreorder(subroot->left());preorder(subroot->right());

    }

    template // Bad implementationvoid preorder2(BinNode* subroot) {visit(subroot); // Perform some actionif (subroot->left() != NULL)preorder2(subroot->left());

    if (subroot->right() != NULL)preorder2(subroot->right());

    }

    Traversal Example

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    112/346

    Traversal Example

    // Return the number of nodes in the treetemplate int count(BinNode* subroot) {if (subroot == NULL)return 0; // Nothing to count

    return 1 + count(subroot->left())

    + count(subroot->right());}

    Binary Tree Implementation (1)

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    113/346

    Binary Tree Implementation (1)

    Binary Tree Implementation (2)

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    114/346

    Binary Tree Implementation (2)

    Union Implementation (1)

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    115/346

    Union Implementation (1)

    enum Nodetype {leaf, internal};class VarBinNode { // Generic node classpublic:Nodetype mytype; // Store type for nodeunion {struct { // nternal node

    VarBinNode* left; // Left childVarBinNode* right; // Right childOperator opx; // Value

    } intl;Operand var; // Leaf: Value only

    };

    Union Implementation (2)

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    116/346

    Union Implementation (2)

    // Leaf constructorVarBinNode(const Operand& val){ mytype = leaf; var = val; }

    // Internal node constructorVarBinNode(const Operator& op,

    VarBinNode* l, VarBinNode* r) {mytype = internal; intl.opx = op;intl.left = l; intl.right = r;

    }bool isLeaf() { return mytype == leaf; }VarBinNode* leftchild(){ return intl.left; }

    VarBinNode* rightchild()

    { return intl.right; }};

    Union Implementation (3)

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    117/346

    Union Implementation (3)

    // Preorder traversalvoid traverse(VarBinNode* subroot) {if (subroot == NULL) return;if (subroot->isLeaf())cout rightchild());

    }}

    Inheritance (1)

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    118/346

    Inheritance (1)

    class VarBinNode { // Abstract base classpublic:virtual bool isLeaf() = 0;

    };

    class LeafNode : public VarBinNode { // Leafprivate:Operand var; // Operand value

    public:LeafNode(const Operand& val){ var = val; } // Constructor

    bool isLeaf() { return true; }Operand value() { return var; }

    };

    Inheritance (2)

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    119/346

    Inheritance (2)

    // Internal nodeclass IntlNode : public VarBinNode {private:VarBinNode* left; // Left childVarBinNode* right; // Right childOperator opx; // Operator value

    public:IntlNode(const Operator& op,

    VarBinNode* l, VarBinNode* r){ opx = op; left = l; right = r; }

    bool isLeaf() { return false; }VarBinNode* leftchild() { return left; }

    VarBinNode* rightchild() { return right; }Operator value() { return opx; }};

    Inheritance (3)

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    120/346

    Inheritance (3)

    // Preorder traversalvoid traverse(VarBinNode *subroot) {if (subroot == NULL) return; // Emptyif (subroot->isLeaf()) // Do leaf nodecout

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    121/346

    Composite (1)

    class VarBinNode { // Abstract base classpublic:virtual bool isLeaf() = 0;virtual void trav() = 0;

    };

    class LeafNode : public VarBinNode { // Leafprivate:Operand var; // Operand value

    public:LeafNode(const Operand& val){ var = val; } // Constructor

    bool isLeaf() { return true; }

    Operand value() { return var; }void trav() { cout

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    122/346

    Co pos te ( )

    class IntlNode : public VarBinNode {private:VarBinNode* lc; // Left childVarBinNode* rc; // Right childOperator opx; // Operator value

    public:IntlNode(const Operator& op,

    VarBinNode* l, VarBinNode* r){ opx = op; lc = l; rc = r; }

    bool isLeaf() { return false; }VarBinNode* left() { return lc; }VarBinNode* right() { return rc; }Operator value() { return opx; }

    void trav() {cout

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    123/346

    Co pos te (3)

    // Preorder traversalvoid traverse(VarBinNode *root) {if (root != NULL)root->trav();

    }

    Space Overhead (1)

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    124/346

    p ( )

    From the Full Binary Tree Theorem: Half of the pointers are null.

    If leaves store only data, then overhead

    depends on whether the tree is full.

    Ex: All nodes the same, with two pointers tochildren:

    Total space required is (2p+ d)n Overhead: 2pn Ifp= d, this means 2p/(2p+ d) = 2/3 overhead.

    Space Overhead (2)

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    125/346

    p ( )

    Eliminate pointers from the leaf nodes:n/2(2p) pn/2(2p) + dn p+ d

    This is 1/2 ifp= d.

    2p/(2p+ d) if data only at leaves 2/3overhead.

    Note that some method is needed todistinguish leaves from internal nodes.

    =

    Array Implementation (1)

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    126/346

    y p ( )

    Position 0 1 2 3 4 5 6 7 8 9 10 11

    Parent -- 0 0 1 1 2 2 3 3 4 4 5

    Left Child 1 3 5 7 9 11 -- -- -- -- -- --

    Right Child 2 4 6 8 10 -- -- -- -- -- -- --

    Left Sibling -- -- 1 -- 3 -- 5 -- 7 -- 9 --

    Right Sibling -- 2 -- 4 -- 6 -- 8 -- 10 -- --

    Array Implementation (1)

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    127/346

    y p ( )

    Parent (r) =

    Leftchild(r) =

    Rightchild(r) =Leftsibling(r) =

    Rightsibling(r) =

    Binary Search Trees

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    128/346

    y

    BST Property: All elements stored in the leftsubtree of a node with value Khave values < K.All elements stored in the right subtree of a nodewith value Khave values >= K.

    BST ADT(1)

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    129/346

    ( )

    // BST implementation for the Dictionary ADTtemplate class BST : public Dictionary {private:BinNode* root; // Root of the BSTint nodecount; // Number of nodesvoid clearhelp(BinNode*);BinNode*inserthelp(BinNode*, const Elem&);

    BinNode*deletemin(BinNode*,BinNode*&);

    BinNode* removehelp(BinNode*,const Key&, BinNode*&);

    bool findhelp(BinNode*, const Key&,Elem&) const;

    void printhelp(BinNode*, int) const;

    BST ADT(2)

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    130/346

    ( )

    public:BST() { root = NULL; nodecount = 0; }~BST() { clearhelp(root); }void clear() { clearhelp(root); root = NULL;

    nodecount = 0; }bool insert(const Elem& e) {root = inserthelp(root, e);nodecount++;return true; }

    bool remove(const Key& K, Elem& e) {BinNode* t = NULL;root = removehelp(root, K, t);if (t == NULL) return false;

    e = t->val();nodecount--;delete t;return true; }

    BST ADT(3)

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    131/346

    ( )

    bool removeAny(Elem& e) { // Delete min valueif (root == NULL) return false; // EmptyBinNode* t;root = deletemin(root, t);e = t->val();delete t;nodecount--;

    return true;}bool find(const Key& K, Elem& e) const{ return findhelp(root, K, e); }

    int size() { return nodecount; }void print() const {

    if (root == NULL)cout

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    132/346

    template bool BST::findhelp(BinNode* subroot,

    const Key& K, Elem& e) const {if (subroot == NULL) return false;

    else if (KEComp::lt(K, subroot->val()))return findhelp(subroot->left(), K, e);

    else if (KEComp::gt(K, subroot->val()))return findhelp(subroot->right(), K, e);

    else { e = subroot->val(); return true; }

    }

    BST Insert (1)

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    133/346

    ( )

    BST Insert (2)

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    134/346

    ( )

    template BinNode* BST::inserthelp(BinNode* subroot,

    const Elem& val) {if (subroot == NULL) // Empty: create node

    return new BinNodePtr(val,NULL,NULL);if (EEComp::lt(val, subroot->val()))subroot->setLeft(inserthelp(subroot->left(),

    val));else subroot->setRight(

    inserthelp(subroot->right(), val));// Return subtree with node insertedreturn subroot;}

    Remove Minimum Value

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    135/346

    template BinNode* BST::deletemin(BinNode* subroot,

    BinNode*& min) {

    if (subroot->left() == NULL) {min = subroot;return subroot->right();

    }else { // Continue left

    subroot->setLeft(deletemin(subroot->left(), min));return subroot;

    }}

    BST Remove (1)

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    136/346

    BST Remove (2)

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    137/346

    template BinNode* BST::removehelp(BinNode* subroot,

    const Key& K, BinNode*& t) {if (subroot == NULL) return NULL;

    else if (KEComp::lt(K, subroot->val()))subroot->setLeft(

    removehelp(subroot->left(), K, t));else if (KEComp::gt(K, subroot->val()))subroot->setRight(

    removehelp(subroot->right(), K, t));

    BST Remove (2)

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    138/346

    else { // Found it: remove itBinNode* temp;t = subroot;if (subroot->left() == NULL)subroot = subroot->right();

    else if (subroot->right() == NULL)

    subroot = subroot->left();else { // Both children are non-emptysubroot->setRight(

    deletemin(subroot->right(), temp));Elem te = subroot->val();

    subroot->setVal(temp->val());temp->setVal(te);t = temp;

    } }return subroot;

    }

    Cost of BST Operations

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    139/346

    Find:

    Insert:

    Delete:

    Heaps

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    140/346

    Heap: Complete binary tree with the heapproperty: Min-heap: All values less than child values. Max-heap: All values greater than child values.

    The values are partially ordered.

    Heap representation: Normally the array-

    based complete binary treerepresentation.

    Heap ADT

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    141/346

    template class maxheap{private:Elem* Heap; // Pointer to the heap arrayint size; // Maximum size of the heapint n; // Number of elems now in heapvoid siftdown(int); // Put element in place

    public:

    maxheap(Elem* h, int num, int max);int heapsize() const;bool isLeaf(int pos) const;int leftchild(int pos) const;int rightchild(int pos) const;int parent(int pos) const;

    bool insert(const Elem&);bool removemax(Elem&);bool remove(int, Elem&);void buildHeap();

    };

    Building the Heap

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    142/346

    (a) (4-2) (4-1) (2-1) (5-2) (5-4) (6-3) (6-5) (7-5) (7-6)(b) (5-2), (7-3), (7-1), (6-1)

    Siftdown (1)

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    143/346

    For fast heap construction: Work from high end of array to low end. Call siftdownfor each item. Dont need to call siftdownon leaf nodes.

    template void maxheap::siftdown(int pos) {while (!isLeaf(pos)) {int j = leftchild(pos);int rc = rightchild(pos);if ((rc

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    144/346

    Buildheap Cost

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    145/346

    Cost for heap construction:

    log n

    (i- 1) n/2i

    n.i=1

    Remove Max Value

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    146/346

    template bool maxheap::removemax(Elem& it) {if (n == 0) return false; // Heap is emptyswap(Heap, 0, --n); // Swap max with endif (n != 0) siftdown(0);

    it = Heap[n]; // Return max valuereturn true;}

    Priority Queues (1)

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    147/346

    A priority queue stores objects, and on requestreleases the object with greatest value.

    Example: Scheduling jobs in a multi-taskingoperating system.

    The priority of a job may change, requiring somereordering of the jobs.

    Implementation: Use a heap to store the priority

    queue.

    Priority Queues (2)

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    148/346

    To support priority reordering, delete and re-insert.Need to know index for the object in question.

    template bool maxheap::remove(int pos,

    Elem& it) {if ((pos < 0) || (pos >= n)) return false;swap(Heap, pos, --n);while ((pos != 0) && (Comp::gt(Heap[pos],

    Heap[parent(pos)])))swap(Heap, pos, parent(pos));

    siftdown(pos);it = Heap[n];return true;

    }

    Huffman Coding Trees

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    149/346

    ASCII codes: 8 bits per character. Fixed-length coding.

    Can take advantage of relative frequency of letters

    to save space. Variable-length coding

    Build the tree with minimum external path weight.

    Z K F C U D L E

    2 7 24 32 37 42 42 120

    Huffman Tree Construction (1)

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    150/346

    Huffman Tree Construction (2)

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    151/346

    Assigning Codes

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    152/346

    Letter Freq Code BitsC 32

    D 42

    E 120

    F 24

    K 7

    L 42

    U 37Z 2

    Coding and Decoding

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    153/346

    A set of codes is said to meet the prefixproperty if no code in the set is the prefixof another.

    Code for DEED:

    Decode 1011001110111101:

    Expected cost per letter:

    General Trees

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    154/346

    General Tree Node

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    155/346

    // General tree node ADTtemplate class GTNode {public:GTNode(const Elem&); // Constructor~GTNode(); // DestructorElem value(); // Return valuebool isLeaf(); // TRUE if is a leaf

    GTNode* parent(); // Return parentGTNode* leftmost_child(); // First childGTNode* right_sibling(); // Right siblingvoid setValue(Elem&); // Set valuevoid insert_first(GTNode* n);void insert_next(GTNode* n);

    void remove_first(); // Remove first childvoid remove_next(); // Remove sibling};

    General Tree Traversal

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    156/346

    template void GenTree::printhelp(GTNode* subroot) {if (subroot->isLeaf()) cout

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    157/346

    Equivalence Class Problem

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    158/346

    The parent pointer representation is good foranswering: Are two elements in the same tree?

    // Return TRUE if nodes in different treesbool Gentree::differ(int a, int b) {int root1 = FIND(a); // Find root for aint root2 = FIND(b); // Find root for breturn root1 != root2; // Compare roots

    }

    Union/Find

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    159/346

    void Gentree::UNION(int a, int b) {int root1 = FIND(a); // Find root for aint root2 = FIND(b); // Find root for bif (root1 != root2) array[root2] = root1;

    }

    int Gentree::FIND(int curr) const {

    while (array[curr]!=ROOT) curr = array[curr];return curr; // At root}

    Want to keep the depth small.

    Weighted union rule: Join the tree with fewernodes to the tree with more nodes.

    Equiv Class Processing (1)

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    160/346

    Equiv Class Processing (2)

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    161/346

    Path Compression

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    162/346

    int Gentree::FIND(int curr) const {if (array[curr] == ROOT) return curr;return array[curr] = FIND(array[curr]);

    }

    Lists of Children

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    163/346

    Leftmost Child/Right Sibling (1)

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    164/346

    Leftmost Child/Right Sibling (2)

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    165/346

    Linked Implementations (1)

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    166/346

    Linked Implementations (2)

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    167/346

    Converting to a Binary Tree

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    168/346

    Left child/right sibling representationessentially stores a binary tree.

    Use this process to convert any general treeto a binary tree.

    A forest is a collection of one or moregeneral trees.

    Sequential Implementations (1)

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    169/346

    List node values in the order they would bevisited by a preorder traversal.

    Saves space, but allows only sequentialaccess.

    Need to retain tree structure forreconstruction.

    Example: For binary trees, us a symbol tomark nulllinks.AB/D//CEG///FH//I//

    Sequential Implementations (2)

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    170/346

    Example: For full binary trees, mark nodesas leaf or internal.AB/DCEG/FHI

    Example: For general trees, mark the end ofeach subtree.

    RAC)D)E))BF)))

    Sorting

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    171/346

    Each record contains a field called the key. Linear order: comparison.

    Measures of cost:

    Comparisons Swaps

    Insertion Sort (1)

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    172/346

    Insertion Sort (2)

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    173/346

    template

    void inssort(Elem A[], int n) {for (int i=1; i0) &&

    (Comp::lt(A[j], A[j-1])); j--)swap(A, j, j-1);

    }

    Best Case:Worst Case:

    Average Case:

    Bubble Sort (1)

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    174/346

    Bubble Sort (2)

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    175/346

    template

    void bubsort(Elem A[], int n) {for (int i=0; ii; j--)if (Comp::lt(A[j], A[j-1]))swap(A, j, j-1);

    }

    Best Case:Worst Case:

    Average Case:

    Selection Sort (1)

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    176/346

    Selection Sort (2)

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    177/346

    template

    void selsort(Elem A[], int n) {for (int i=0; ii; j--) // Find leastif (Comp::lt(A[j], A[lowindex]))lowindex = j; // Put it in place

    swap(A, i, lowindex);}}

    Best Case:

    Worst Case:Average Case:

    Pointer Swapping

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    178/346

    Summary

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    179/346

    Insertion Bubble SelectionComparisons:

    Best Case (n) (n2) (n2)Average Case (n2) (n2) (n2)

    Worst Case (n2) (n2) (n2)

    Swaps

    Best Case 0 0 (n)

    Average Case (n2) (n2) (n)Worst Case (n2) (n2) (n)

    Exchange Sorting

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    180/346

    All of the sorts so far rely on exchanges ofadjacent records.

    What is the average number of exchanges

    required? There are n! permutations Consider permuationXand its reverse,X Together, every pair requires n(n-1)/2

    exchanges.

    Shellsort

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    181/346

    Shellsort

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    182/346

    // Modified version of Insertion Sort

    template void inssort2(Elem A[], int n, int incr) {for (int i=incr; i=incr) &&(Comp::lt(A[j], A[j-incr])); j-=incr)

    swap(A, j, j-incr);}

    template void shellsort(Elem A[], int n) { // Shellsortfor (int i=n/2; i>2; i/=2) // For each incr

    for (int j=0; j

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    183/346

    template

    void qsort(Elem A[], int i, int j) {if (j

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    184/346

    template

    int partition(Elem A[], int l, int r,Elem& pivot) {do { // Move the bounds in until they meetwhile (Comp::lt(A[++l], pivot));while ((r != 0) && Comp::gt(A[--r],

    pivot));

    swap(A, l, r); // Swap out-of-place values} while (l < r); // Stop when they crossswap(A, l, r); // Reverse last swapreturn l; // Return first pos on right

    }

    The cost for partition is (n).

    Partition Example

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    185/346

    Quicksort Example

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    186/346

    Cost of Quicksort

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    187/346

    Best case: Always partition in half.Worst case: Bad partition.Average case:

    T(n) = n+ 1 + 1/(n-1) (T(k) + T(n-k))Optimizations for Quicksort:

    Better Pivot

    Better algorithm for small sublists Eliminate recursion

    k=1

    n-1

    Mergesort

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    188/346

    List mergesort(List inlist) {

    if (inlist.length()

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    189/346

    template

    void mergesort(Elem A[], Elem temp[],int left, int right) {int mid = (left+right)/2;if (left == right) return;mergesort(A, temp, left, mid);mergesort(A, temp, mid+1, right);

    for (int i=left; i

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    190/346

    template

    void mergesort(Elem A[], Elem temp[],int left, int right) {if ((right-left) =left; i--) temp[i] = A[i];for (j=1; j

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    191/346

    Mergesort cost:

    Mergsort is also good for sorting linked lists.

    Mergesort requires twice the space.

    Heapsort

    t l t < l El l C >

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    192/346

    template

    void heapsort(Elem A[], int n) { // HeapsortElem mval;maxheap H(A, n, n);for (int i=0; i

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    193/346

    Heapsort Example (2)

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    194/346

    Binsort (1)

    A i l ffi i t t

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    195/346

    A simple, efficient sort:

    for (i=0; i

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    196/346

    template

    void binsort(Elem A[], int n) {List B[MaxKeyValue];Elem item;for (i=0; i

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    197/346

    Radix Sort (2)

    template

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    198/346

    template

    void radix(Elem A[], Elem B[],int n, int k, int r, int cnt[]) {

    // cnt[i] stores # of records in bin[i]int j;for (int i=0, rtok=1; i

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    199/346

    Radix Sort Cost

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    200/346

    Cost: (nk+ rk)

    How do n, k, and rrelate?

    If key range is small, then this can be (n).

    If there are ndistinct keys, then the length of

    a key must be at least log n. Thus, Radix Sort is (nlog n) in general case

    Empirical Comparison (1)

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    201/346

    Empirical Comparison (2)

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    202/346

    Sorting Lower Bound

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    203/346

    We would like to know a lower bound for allpossible sorting algorithms.

    Sorting is O(nlog n) (average, worst cases)because we know of algorithms with thisupper bound.

    Sorting I/O takes (n) time.

    We will now prove (nlog n) lower boundfor sorting.

    Decision Trees

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    204/346

    Lower Bound Proof

    There are n! permutations

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    205/346

    There are n! permutations. A sorting algorithm can be viewed as

    determining which permutation has been input. Each leaf node of the decision tree corresponds

    to one permutation.

    A tree with n nodes has (log n) levels, so thetree with n! leaves has (log n!) = (nlog n)levels.

    Which node in the decision tree correspondsto the worst case?

    Primary vs. Secondary Storage

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    206/346

    Primary storage: Main memory (RAM)

    Secondary Storage: Peripheral devices Disk drives

    Tape drives

    Comparisons

    Medium Early 1996 Mid 1997 Early 2000

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    207/346

    RAM is usually volatile.

    RAM is about 1/4 million times faster thandisk.

    y y

    RAM $45.00 7.00 1.50

    Disk 0.25 0.10 0.01

    Floppy 0.50 0.36 0.25

    Tape 0.03 0.01 0.001

    Golden Rule of File Processing

    Minimize the number of disk accesses!

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    208/346

    Minimize the number of disk accesses!

    1. Arrange information so that you get what you wantwith few disk accesses.

    2. Arrange information to minimize future disk accesses.

    An organization for data on disk is often called afile structure.

    Disk-based space/time tradeoff: Compressinformation to save processing time by

    reducing disk accesses.

    Disk Drives

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    209/346

    Sectors

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    210/346

    A sector is the basic unit of I/O.

    Interleaving factor: Physical distancebetween logically adjacent sectors on atrack.

    Terms

    When record is read

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    211/346

    Locality of Reference:When record is read

    from disk, next request is likely to come fromnear the same place in the file.

    Cluster: Smallest unit of file allocation, usually

    several sectors.Extent: A group of physically contiguous clusters.

    Internal fragmentation: Wasted space withinsector if record size does not match sectorsize; wasted space within cluster if file size isnot a multiple of cluster size.

    Seek Time

    Seek time: Time for I/O head to reach

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    212/346

    Seek time: Time for I/O head to reachdesired track. Largely determined bydistance between I/O head and desiredtrack.

    Track-to-track time: Minimum time to movefrom one track to an adjacent track.

    Average Seek time: Average time to reach atrack for random access.

    Other Factors

    Rotational Delay or Latency: Time for data

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    213/346

    Rotational Delay or Latency: Time for datato rotate under I/O head.

    One half of a rotation on average. At 7200 rpm, this is 8.3/2 = 4.2ms.

    Transfer time: Time for data to move underthe I/O head.

    At 7200 rpm: Number of sectors

    read/Number of sectors per track * 8.3ms.

    Disk Spec Example

    16 8 GB disk on 10 platters = 1 68GB/platter

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    214/346

    16.8 GB disk on 10 platters = 1.68GB/platter13,085 tracks/platter256 sectors/track512 bytes/sector

    Track-to-track seek time: 2.2 msAverage seek time: 9.5ms4KB clusters, 32 clusters/track.

    Interleaving factor of 3.5400RPM

    Disk Access Cost Example (1)

    Read a 1MB file divided into 2048 records of

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    215/346

    Read a 1MB file divided into 2048 records of512 bytes (1 sector) each.

    Assume all records are on 8 contiguoustracks.

    First track: 9.5 + 11.1/2 + 3 x 11.1 = 48.4 ms

    Remaining 7 tracks: 2.2 + 11.1/2 + 3 x 11.1

    = 41.1 ms.Total: 48.4 + 7 * 41.1 = 335.7ms

    Disk Access Cost Example (2)

    Read a 1MB file divided into 2048 records of

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    216/346

    Read a 1MB file divided into 2048 records of512 bytes (1 sector) each.

    Assume all file clusters are randomly spreadacross the disk.

    256 clusters. Cluster read time is(3 x 8)/256 of a rotation for about 1 ms.

    256(9.5 + 11.1/2 + (3 x 8)/256) is about3877 ms. or nearly 4 seconds.

    How Much to Read?

    Read time for one track:

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    217/346

    Read time for one track:9.5 + 11.1/2 + 3 x 11.1 = 48.4ms.

    Read time for one sector:9.5 + 11.1/2 + (1/256)11.1 = 15.1ms.

    Read time for one byte:9.5 + 11.1/2 = 15.05 ms.

    Nearly all disk drives read/write one sector

    at every I/O access. Also referred to as a page.

    Buffers

    The information in a sector is stored in a

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    218/346

    The information in a sector is stored in abuffer or cache.

    If the next I/O access is to the same buffer,

    then no need to go to disk.

    There are usually one or more input buffersand one or more output buffers.

    Buffer Pools

    A series of buffers used by an application to

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    219/346

    A series of buffers used by an application tocache disk data is called a buffer pool.

    Virtual memory uses a buffer pool to imitate

    greater RAM memory by actually storinginformation on disk and swappingbetween disk and RAM.

    Buffer Pools

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    220/346

    Organizing Buffer Pools

    Which buffer should be replaced when new

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    221/346

    Which buffer should be replaced when newdata must be read?

    First-in, First-out: Use the first one on thequeue.

    Least Frequently Used (LFU): Count bufferaccesses, reuse the least used.

    Least Recently used (LRU): Keep buffers ona linked list. When buffer is accessed,bring it to front. Reuse the one at end.

    Bufferpool ADT (1)

    class BufferPool { // (1) Message Passing

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    222/346

    public:virtual void insert(void* space,int sz, int pos) = 0;

    virtual void getbytes(void* space,int sz, int pos) = 0;

    };

    class BufferPool { // (2) Buffer Passing

    public:

    virtual void* getblock(int block) = 0;

    virtual void dirtyblock(int block) = 0;

    virtual int blocksize() = 0;};

    Design Issues

    Disadvantage of message passing:

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    223/346

    g g p g Messages are copied and passed back and forth.

    Disadvantages of buffer passing: The user is given access to system memory (the

    buffer itself)

    The user must explicitly tell the buffer pool whenbuffer contents have been modified, so that modifieddata can be rewritten to disk when the buffer isflushed.

    The pointer might become stale when the bufferpoolreplaces the contents of a buffer.

    Programmers View of Files

    Logical view of files:

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    224/346

    g An a array of bytes. A file pointer marks the current position.

    Three fundamental operations: Read bytes from current position (move filepointer)

    Write bytes to current position (move filepointer)

    Set file pointer to specified byte position.

    C++ File Functions

    #include

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    225/346

    void fstream::open(char* name, openmode mode); Example:ios::in | ios::binary

    void fstream::close();

    fstream::read(char* ptr, int numbytes);

    fstream::write(char* ptr, int numbtyes);

    fstream::seekg(int pos);fstream::seekg(int pos, ios::curr);

    fstream::seekp(int pos);fstream::seekp(int pos, ios::end);

    External Sorting

    Problem: Sorting data sets too large to fit

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    226/346

    Problem: Sorting data sets too large to fitinto main memory.

    Assume data are stored on disk drive.

    To sort, portions of the data must be broughtinto main memory, processed, andreturned to disk.

    An external sort should minimize diskaccesses.

    Model of External Computation

    Secondary memory is divided into equal-sized

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    227/346

    y y qblocks (512, 1024, etc)

    A basic I/O operation transfers the contents of onedisk block to/from main memory.

    Under certain circumstances, reading blocks of afile in sequential order is more efficient.(When?)

    Primary goal is to minimize I/O operations.

    Assume only one disk drive is available.

    Key Sorting

    Often, records are large, keys are small.

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    228/346

    , g , y Ex: Payroll entries keyed on ID number

    Approach 1: Read in entire records, sortthem, then write them out again.

    Approach 2: Read only the key values, storewith each key the location on disk of itsassociated record.

    After keys are sorted the records can beread and rewritten in sorted order.

    Simple External Mergesort (1)

    Quicksort requires random access to the

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    229/346

    Q qentire set of records.

    Better: Modified Mergesort algorithm.

    Process nelements in (log n) passes.

    A group of sorted records is called a run.

    Simple External Mergesort (2)

    Split the file into two files.

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    230/346

    p Read in a block from each file. Take first record from each block, output them in

    sorted order. Take next record from each block, output them

    to a second file in sorted order. Repeat until finished, alternating between output

    files. Read new input blocks as needed. Repeat steps 2-5, except this time input files

    have runs of two sorted records that are mergedtogether.

    Each pass through the files provides larger runs.

    Simple External Mergesort (3)

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    231/346

    Problems with Simple Mergesort

    Is each pass through input and output files

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    232/346

    sequential?

    What happens if all work is done on a single diskdrive?

    How can we reduce the number of Mergesortpasses?

    In general, external sorting consists of two phases: Break the files into initial runs

    Merge the runs together into a single run.

    Breaking a File into Runs

    General approach:

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    233/346

    pp Read as much of the file into memory as

    possible. Perform an in-memory sort. Output this group of records as a single run.

    Replacement Selection (1)

    Break available memory into an array for

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    234/346

    y ythe heap, an input buffer, and an outputbuffer.

    Fill the array from disk.

    Make a min-heap. Send the smallest value (root) to the

    output buffer.

    Replacement Selection (2)

    If the next key in the file is greater than

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    235/346

    y gthe last value output, then

    Replace the root with this keyelse

    Replace the root with the last key in thearrayAdd the next record in the file to a new heap

    (actually, stick it at the end of the array).

    RS Example

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    236/346

    Snowplow Analogy (1)

    Imagine a snowplow moving around a circular

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    237/346

    track on which snow falls at a steady rate.

    At any instant, there is a certain amount ofsnow Son the track. Some falling snow

    comes in front of the plow, some behind.During the next revolution of the plow, all of

    this is removed, plus 1/2 of what falls

    during that revolution.Thus, the plow removes 2Samount of snow.

    Snowplow Analogy (2)

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    238/346

    Problems with Simple Merge

    Simple mergesort: Place runs into two files.

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    239/346

    Merge the first two runs to output file, thennext two runs, etc.

    Repeat process until only one run remains.

    How many passes for r initial runs?

    Is there benefit from sequential reading?Is working memory well used?

    Need a way to reduce the number ofpasses.

    Multiway Merge (1)

    With replacement selection, each initial run

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    240/346

    is several blocks long.

    Assume each run is placed in separate file.

    Read the first block from each file intomemory and perform an r-way merge.

    When a buffer becomes empty, read a blockfrom the appropriate run file.

    Each record is read only once from diskduring the merge process.

    Multiway Merge (2)

    In practice, use only one file and seek to

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    241/346

    appropriate block.

    Limits to Multiway Merge (1)

    Assume working memory is bblocks in size.

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    242/346

    How many runs can be processed at onetime?

    The runs are 2bblocks long (on average).

    How big a file can be merged in one pass?

    Limits to Multiway Merge (2)

    Larger files will need more passes -- but the

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    243/346

    run size grows quickly!

    This approach trades (log b) (possibly)

    sequential passes for a single or veryfew random (block) access passes.

    General Principles

    A good external sorting algorithm will seek to do

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    244/346

    the following: Make the initial runs as long as possible. At all stages, overlap input, processing and

    output as much as possible.

    Use as much working memory as possible.Applying more memory usually speedsprocessing.

    If possible, use additional disk drives for

    more overlapping of processing with I/O,and allow for more sequential fileprocessing.

    Search

    Given: Distinct keys k , k , , k and

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    245/346

    1 2 ncollection Tof nrecords of the form(k1, I1), (k2, I2), , (kn, In)

    where Ijis the information associated with

    key kjfor 1

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    246/346

    A successful search is one in which a recordwith key kj= Kis found.

    An unsuccessful search is one in which norecord with kj= Kis found (andpresumably no such record exists).

    Approaches to Search

    1 S i l d li h d (li bl

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    247/346

    1. Sequential and list methods (lists, tables,arrays).

    2. Direct access by key value (hashing)

    3. Tree indexing methods.

    Searching Ordered Arrays

    S i l S h

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    248/346

    Sequential Search

    Binary Search

    Dictionary Search

    Lists Ordered by Frequency

    Order lists by (expected) frequency of

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    249/346

    occurrence.

    Perform sequential search

    Cost to access first record: 1Cost to access second record: 2

    Expected search cost:

    ....21 21 nn npppC

    Examples(1)

    (1) All d h l f

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    250/346

    (1) All records have equal frequency.

    2/)1(/

    1

    nniCn

    i

    n

    Examples(2)

    (2) E ti l F

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    251/346

    (2) Exponential Frequency

    ni

    nip

    n

    i

    i

    if2/1

    11if2/1

    1

    {

    n

    i

    i

    n iC1

    .2)2/(

    Zipf Distributions

    Applications:Di t ib ti f f f d i

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    252/346

    Distribution for frequency of word usage innatural languages.

    Distribution for populations of cities, etc.

    80/20 rule: 80% of accesses are to 20% of the records.

    For distributions following 80/20 rule,

    n

    i

    ennn nnniiC1

    .log/H//

    .1.0 nCn

    Self-Organizing Lists

    Self-organizing lists modify the order ofd ithi th li t b d th t l

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    253/346

    records within the list based on the actualpattern of record accesses.

    Self-organizing lists use a heuristic fordeciding how to reorder the list. Theseheuristics are similar to the rules formanaging buffer pools.

    Heuristics

    1. Order by actual historical frequency of(Si il t LFU b ff l

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    254/346

    access. (Similar to LFU buffer poolreplacement strategy.)

    2. Move-to-Front: When a record is found,move it to the front of the list.

    3. Transpose: When a record is found,swap it with the record ahead of it.

    Text Compression Example

    Application: Text Compression.

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    255/346

    Keep a table of words already seen,organized via Move-to-Front heuristic.

    If a word not yet seen, send the word.

    Otherwise, send (current) index in the table.

    The car on the left hit the car I left.The car on 3 left hit 3 5 I 5.

    This is similar in spirit to Ziv-Lempel coding.

    Searching in Sets

    For dense sets (small range, hight f l t i t)

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    256/346

    percentage of elements in set).

    Can use logical bit operators.

    Example: To find all primes that are oddnumbers, compute:0011010100010100 & 0101010101010101

    Hashing (1)

    Hashing: The process of mapping a keyal e to a position in a table

  • 8/21/2019 A Practical Introduction to Data Structures and Algorithm Analysis

    257/346

    value to a position in a table.

    A hash function maps key values topositions. It is denoted by h.

    A hash table is an array that holds therecords. It is denoted by HT.

    HThas Mslots, indexed form 0 to M-1.

    Hashing (2)

    For any value Kin the key range and some hashfunction h h (K) i