Sorting Algorithms - CmpE WEB · 2017-10-18 · sorting algorithms (the other one is quicksort). It...

74
Sorting Algorithms October 18, 2017 CMPE 250 Sorting Algorithms October 18, 2017 1 / 74

Transcript of Sorting Algorithms - CmpE WEB · 2017-10-18 · sorting algorithms (the other one is quicksort). It...

Sorting Algorithms

October 18, 2017

CMPE 250 Sorting Algorithms October 18, 2017 1 / 74

Sorting

Sorting is a process that organizes a collection of data into eitherascending or descending order.

An internal sort requires that the collection of data fit entirely in thecomputer’s main memory.

We can use an external sort when the collection of data cannot fit inthe computer’s main memory all at once but must reside in secondarystorage such as on a disk (or tape).

We will analyze only internal sorting algorithms.

CMPE 250 Sorting Algorithms October 18, 2017 2 / 74

Why Sorting?

Any significant amount of computer output is generally arranged insome sorted order so that it can be interpreted.

Sorting also has indirect uses. An initial sort of the data cansignificantly enhance the performance of an algorithm.

Majority of programming projects use a sort somewhere, and in manycases, the sorting cost determines the running time.

A comparison-based sorting algorithm makes ordering decisions onlyon the basis of comparisons.

CMPE 250 Sorting Algorithms October 18, 2017 3 / 74

Sorting Algorithms

There are many sorting algorithms, such as:Selection SortInsertion SortBubble SortMerge SortQuick SortHeap SortShell Sort

The first three are the foundations for faster and more efficientalgorithms.

CMPE 250 Sorting Algorithms October 18, 2017 4 / 74

Insertion Sort

Insertion sort is a simple sorting algorithm that is appropriate for smallinputs.

The most common sorting technique used by card players.

The list is divided into two parts: sorted and unsorted.

In each pass, the first element of the unsorted part is picked up,transferred to the sorted sublist, and inserted at the appropriate place.

A list of n elements will take at most n − 1 passes to sort the data.

CMPE 250 Sorting Algorithms October 18, 2017 5 / 74

Insertion Sort Example

CMPE 250 Sorting Algorithms October 18, 2017 6 / 74

Insertion Sort Algorithm

// Simple insertion sort.template <typename Comparable>void insertionSort( vector<Comparable> & a )

for( int p = 1; p < a.size( ); ++p )

Comparable tmp = std::move( a[ p ] );

int j;for( j = p; j > 0 && tmp < a[ j - 1 ]; --j )

a[ j ] = std::move( a[ j - 1 ] );a[ j ] = std::move( tmp );

CMPE 250 Sorting Algorithms October 18, 2017 7 / 74

Insertion Sort – Analysis

Running time depends on not only the size of the array but also thecontents of the array.Best-case: → O(n)

Array is already sorted in ascending order.Inner loop will not be executed.The number of moves: 2× (n − 1)→ O(n)The number of key comparisons: (n − 1)→ O(n)

Worst-case: → O(n2)

Array is in reverse order:Inner loop is executed i − 1 times, for i = 2, 3, . . . , nThe number of moves:2× (n− 1) + (1 + 2 + · · ·+ n− 1) = 2× (n− 1) + n× (n− 1)/2→ O(n2)The number of key comparisons:(1 + 2 + · · ·+ n − 1) = n × (n − 1)/2→ O(n2)

Average-case: → O(n2)

We have to look at all possible initial data organizations.

So, Insertion Sort is O(n2)

CMPE 250 Sorting Algorithms October 18, 2017 8 / 74

Analysis of insertion sort

Which running time will be used to characterize this algorithm?Best, worst or average?

Worst:Longest running time (this is the upper limit for the algorithm)It is guaranteed that the algorithm will not be worse than this.

Sometimes we are interested in the average case. But there aresome problems with the average case.

It is difficult to figure out the average case. i.e. what is average input?Are we going to assume all possible inputs are equally likely?In fact for most algorithms the average case is the same as the worstcase.

CMPE 250 Sorting Algorithms October 18, 2017 9 / 74

A lower bound for simple sorting algorithms

An inversion :an ordered pair (Ai ,Aj) such that i < j but Ai > Aj

Example: 10, 6, 7, 15, 3,1Inversions are: (10,6), (10,7), (10,3), (10,1), (6,3), (6,1) (7,3), (7,1)(15,3), (15,1), (3,1)

CMPE 250 Sorting Algorithms October 18, 2017 10 / 74

Swapping

Swapping adjacent elements that are out of order removes oneinversion.

A sorted array has no inversions.

Sorting an array that contains i inversions requires at least i swaps ofadjacent elements.

CMPE 250 Sorting Algorithms October 18, 2017 11 / 74

Theorems

Theorem 1: The average number of inversions in an array of Ndistinct elements is N(N − 1)/4

Theorem 2: Any algorithm that sorts by exchanging adjacentelements requires Ω(N2) time on average.

For a sorting algorithm to run in less than quadratic time it must dosomething other than swap adjacent elements.

CMPE 250 Sorting Algorithms October 18, 2017 12 / 74

Mergesort

Mergesort algorithm is one of the two important divide-and-conquersorting algorithms (the other one is quicksort).It is a recursive algorithm.

Divides the list into halves,Sorts each half separately, andThen merges the sorted halves into one sorted array.

CMPE 250 Sorting Algorithms October 18, 2017 13 / 74

Merge Sort Example

CMPE 250 Sorting Algorithms October 18, 2017 14 / 74

Mergesort

/*** Mergesort algorithm (driver).

*/template <typename Comparable>void mergeSort( vector<Comparable> & a )

vector<Comparable> tmpArray( a.size( ) );

mergeSort( a, tmpArray, 0, a.size( ) - 1 );

CMPE 250 Sorting Algorithms October 18, 2017 15 / 74

Mergesort (Cont.)

/*** Internal method that makes recursive calls.

* a is an array of Comparable items.

* tmpArray is an array to place the merged result.

* left is the left-most index of the subarray.

* right is the right-most index of the subarray.

*/template<typename Comparable>void mergeSort(vector<Comparable> & a, vector<Comparable> &

tmpArray, int left, int right) if (left < right)

int center = (left + right) / 2;mergeSort(a, tmpArray, left, center);mergeSort(a, tmpArray, center + 1, right);merge(a, tmpArray, left, center + 1, right);

CMPE 250 Sorting Algorithms October 18, 2017 16 / 74

Merge

/*** Internal method that merges two sorted halves of a subarray.

* a is an array of Comparable items.

* tmpArray is an array to place the merged result.

* leftPos is the left-most index of the subarray.

* rightPos is the index of the start of the second half.

* rightEnd is the right-most index of the subarray.

*/template <typename Comparable>void merge( vector<Comparable> & a, vector<Comparable> & tmpArray,

int leftPos, int rightPos, int rightEnd )

int leftEnd = rightPos - 1;int tmpPos = leftPos;int numElements = rightEnd - leftPos + 1;

// Main loopwhile( leftPos <= leftEnd && rightPos <= rightEnd )

if( a[ leftPos ] <= a[ rightPos ] )tmpArray[ tmpPos++ ] = std::move( a[ leftPos++ ] );

elsetmpArray[ tmpPos++ ] = std::move( a[ rightPos++ ] );

while( leftPos <= leftEnd ) // Copy rest of first halftmpArray[ tmpPos++ ] = std::move( a[ leftPos++ ] );

while( rightPos <= rightEnd ) // Copy rest of right halftmpArray[ tmpPos++ ] = std::move( a[ rightPos++ ] );

// Copy tmpArray backfor( int i = 0; i < numElements; ++i, --rightEnd )

a[ rightEnd ] = std::move( tmpArray[ rightEnd ] );

CMPE 250 Sorting Algorithms October 18, 2017 17 / 74

Merge Sort Example

CMPE 250 Sorting Algorithms October 18, 2017 18 / 74

Merge Sort Example

CMPE 250 Sorting Algorithms October 18, 2017 19 / 74

Mergesort – Analysis of Merge

A worst-case instance of the merge step in mergesort

CMPE 250 Sorting Algorithms October 18, 2017 20 / 74

Mergesort – Analysis of Merge (cont.)

Merging two sorted arrays of size k

Best-case:All the elements in the first array are smaller (or larger) than all theelements in the second array.The number of moves: 2k + 2kThe number of key comparisons: k

Worst-case:The number of moves: 2k + 2kThe number of key comparisons: 2k − 1

CMPE 250 Sorting Algorithms October 18, 2017 21 / 74

Mergesort - Analysis

Levels of recursive calls to mergesort, given an array of eight items

CMPE 250 Sorting Algorithms October 18, 2017 22 / 74

Mergesort - Analysis

CMPE 250 Sorting Algorithms October 18, 2017 23 / 74

Mergesort - Analysis

Worst-case – The number of key comparisons:= 20× (2×2m−1−1) + 21× (2×2m−2−1) + ...+ 2m−1× (2×20−1)= (2m − 1) + (2m − 2) + ...+ (2m − 2m−1) ( m terms )= m × 2m −

∑m−1i=0 2i

= m × 2m − 2m − 1Using m = logn= n × log2n − n − 1→ O(n × log2n)

CMPE 250 Sorting Algorithms October 18, 2017 24 / 74

Mergesort - Analysis

Mergesort is an extremely efficient algorithm with respect to time.Both worst case and average cases are O(n × log2n)

But, mergesort requires an extra array whose size equals to the sizeof the original array.If we use a linked list, we do not need an extra array

But, we need space for the linksAnd, it will be difficult to divide the list into half ( O(n) )

CMPE 250 Sorting Algorithms October 18, 2017 25 / 74

Mergesort for Linked Lists

Merge sort is often preferred for sorting a linked list. The slowrandom-access performance of a linked list makes some otheralgorithms (such as quicksort) perform poorly, and others (such asheapsort) completely impossible.

MergeSort1 If head is NULL or there is only one element in the Linked List then

return.2 Else divide the linked list into two halves.3 Sort the two halves a and b.

MergeSort(&first);MergeSort(&second);

4 Merge the two parts of the list into a sorted one.*head = Merge(first, second);

CMPE 250 Sorting Algorithms October 18, 2017 26 / 74

Mergesort for linked lists

#include <iostream>using namespace std;

// Link list nodetypedef struct Node* listpointer;struct Node

int data;listpointer next;

;

// function prototypes

listpointer SortedMerge(listpointer a, listpointer b);

void FrontBackSplit(listpointer source,listpointer* frontRef, listpointer* backRef);

// sorts the linked list by changing next pointers (not data)

void MergeSort(listpointer* headRef)

listpointer head = *headRef;listpointer a;listpointer b;

//Base case -- length 0 or 1if ((head == NULL) || (head->next == NULL))

return;

// Split head into ’a’ and ’b’ sublistsFrontBackSplit(head, &a, &b);

// Recursively sort the sublistsMergeSort(&a);MergeSort(&b);

// answer = merge the two sorted lists together

*headRef = SortedMerge(a, b);

CMPE 250 Sorting Algorithms October 18, 2017 27 / 74

Mergesort for linked lists (cont.)

listpointer SortedMerge(listpointer a, listpointer b)

listpointer result = NULL;

// Base casesif (a == NULL)

return(b);else if (b==NULL)

return(a);

// Pick either a or b, and make recursive callif (a->data <= b->data)

result = a;result->next = SortedMerge(a->next, b);

else

result = b;result->next = SortedMerge(a, b->next);

return(result);

CMPE 250 Sorting Algorithms October 18, 2017 28 / 74

Mergesort for linked lists (cont.)

// Split the nodes of the given list into front and back halves,// and return the two lists using the reference parameters.// If the length is odd, the extra node should go in the front list.// Uses the fast/slow pointer strategy.

void FrontBackSplit(listpointer source,listpointer* frontRef, listpointer* backRef)

listpointer fast;listpointer slow;if (source==NULL || source->next==NULL)

// length < 2 cases

*frontRef = source;

*backRef = NULL;else

slow = source;fast = source->next;

// Advance ’fast’ two nodes, and advance ’slow’ one nodewhile (fast != NULL)

fast = fast->next;if (fast != NULL)

slow = slow->next;fast = fast->next;

// ’slow’ is before the midpoint in the list, so split it in two//at that point.

*frontRef = source;

*backRef = slow->next;slow->next = NULL;

CMPE 250 Sorting Algorithms October 18, 2017 29 / 74

Mergesort for linked lists (cont.)

// Function to print nodes in a given linked listvoid printList(listpointer node)

while(node!=NULL)

cout<< node->data<<" ";node = node->next;

// Function to insert a node at the beginning of the linked list

void push(listpointer* head_ref, int new_data)

// allocate nodelistpointer new_node = new Node;

// put in the datanew_node->data = new_data;

// link the old list off the new nodenew_node->next = (*head_ref);

// move the head to point to the new node(*head_ref) = new_node;

CMPE 250 Sorting Algorithms October 18, 2017 30 / 74

Mergesort for linked lists (cont.)

// Driver program to test above functionsint main()

// Start with the empty list

listpointer a = NULL;int n,num;

// Let us create an unsorted linked list to test the functions

cout<<endl<<"Enter the number of data elements to be sorted: ";cin>>n;

// Create linked list.for(int i = 0; i < n; i++)

cout<<"Enter element "<<i+1<<": ";cin>>num;

push(&a,num);

// Sort the above created Linked ListMergeSort(&a);

cout<< endl << "Sorted Linked List is: "<<endl;printList(a);

return 0;

CMPE 250 Sorting Algorithms October 18, 2017 31 / 74

Quicksort

Like mergesort, Quicksort is also based on the divide-and-conquerparadigm.

But it uses this technique in a somewhat opposite manner, as all thehard work is done before the recursive calls.It works as follows:

1 First, it partitions an array into two parts,2 Then, it sorts the parts independently,3 Finally, it combines the sorted subsequences by a simple

concatenation.

CMPE 250 Sorting Algorithms October 18, 2017 32 / 74

Quicksort

Algorithm 1 Quicksort1: Let S be the input set.2: if | S| = 0 or | S| = 1 then

return3: Pick an element v in S. Call v the pivot.4: Partition S − v into two disjoint groups:

S1 = x ∈ S − v |x ≤ vS2 = x ∈ S − v |x ≥ vreturn quicksort(S1), v , quicksort(S2)

CMPE 250 Sorting Algorithms October 18, 2017 33 / 74

Quicksort Illustrated

CMPE 250 Sorting Algorithms October 18, 2017 34 / 74

Issues To Consider

How to pick the pivot?Many methods (discussed later)

How to partition?Several methods exist.The one we consider is known to give good results and to be easy andefficient.We discuss the partition strategy first.

CMPE 250 Sorting Algorithms October 18, 2017 35 / 74

Partitioning Strategy

For now, assume that pivot = A[(left+right)/2].

We want to partition array A[left .. right].

First, get the pivot element out of the way by swapping it with the lastelement (swap pivot and A[right]).

Let i start at the first element and j start at the next-to-last element (i =left, j = right – 1)

CMPE 250 Sorting Algorithms October 18, 2017 36 / 74

Partitioning Strategy (Cont.)

Want to haveA[k] ≤ pivot, for k < iA[k] ≥ pivot, for k > j

When i < jMove i right, skipping over elements smaller than the pivotMove j left, skipping over elements greater than the pivotWhen both i and j have stopped

A[i] ≥ pivotA[j] ≤ pivot⇒ A[i] and A[j] should now be swapped

CMPE 250 Sorting Algorithms October 18, 2017 37 / 74

Partitioning Strategy (Cont.)

When i and j have stopped and i is to the left of j (thus legal)Swap A[i] and A[j]

The large element is pushed to the right and the small element is pushedto the left

After swappingA[i] ≤ pivotA[j] ≥ pivot

Repeat the process until i and j cross

CMPE 250 Sorting Algorithms October 18, 2017 38 / 74

Partitioning Strategy (Cont.)

When i and j have crossedswap A[i] and pivot

Result:A[k] ≤ pivot, for k < iA[k] ≥ pivot, for k > i

CMPE 250 Sorting Algorithms October 18, 2017 39 / 74

Pivot Strategies

First element:Bad choice if input is sorted or in reverse sorted orderBad choice if input is nearly sorted

Random: even a malicious agent cannot arrange a bad input

Median-of-three: choose the median of the left, right, and centerelements

CMPE 250 Sorting Algorithms October 18, 2017 40 / 74

Median of Three

CMPE 250 Sorting Algorithms October 18, 2017 41 / 74

Median of Three

// Return median of left, center, and right.// Order these and hide the pivot.

template <typename Comparable>const Comparable & median3( vector<Comparable> & a, int left, int right )

int center = ( left + right ) / 2;

if( a[ center ] < a[ left ] )std::swap( a[ left ], a[ center ] );

if( a[ right ] < a[ left ] )std::swap( a[ left ], a[ right ] );

if( a[ right ] < a[ center ] )std::swap( a[ center ], a[ right ] );

// Place pivot at position right - 1std::swap( a[ center ], a[ right - 1 ] );return a[ right - 1 ];

CMPE 250 Sorting Algorithms October 18, 2017 42 / 74

Discussion

Small arrays: Quicksort is slower than insertion sort when is N issmall (say, N ≤ 20).

Optimization: Make |S| ≤ 20 the base case and call insertion sort.

CMPE 250 Sorting Algorithms October 18, 2017 43 / 74

Quicksort algorithm (driver)

template <typename Comparable>void quicksort( vector<Comparable> & a )

quicksort( a, 0, a.size( ) - 1 );

CMPE 250 Sorting Algorithms October 18, 2017 44 / 74

Quicksort algorithm (recursive)

// Uses median-of-three partitioning and a cutoff of 20.// a is an array of Comparable items.// left is the left-most index of the subarray.// right is the right-most index of the subarray.

template <typename Comparable>void quicksort( vector<Comparable> & a, int left, int right )

if( left + 20 <= right )

const Comparable & pivot = median3( a, left, right );

// Begin partitioningint i = left, j = right - 1;for( ; ; )

while( a[ ++i ] < pivot ) while( pivot < a[ --j ] ) if( i < j )

std::swap( a[ i ], a[ j ] );else

break;

std::swap( a[ i ], a[ right - 1 ] ); // Restore pivot

quicksort( a, left, i - 1 ); // Sort small elementsquicksort( a, i + 1, right ); // Sort large elements

else // Do an insertion sort on the subarray

insertionSort( a, left, right );

CMPE 250 Sorting Algorithms October 18, 2017 45 / 74

Analysis of Quicksort

Worst case: pivot is the smallest (or largest) element all the time.T (N) = T (N − 1) + cNT (N − 1) = T (N − 2) + c(N − 1)T (N − 2) = T (N − 3) + c(N − 2). . .T (2) = T (1) + c(2)T (N) = T (1) + c

∑Ni=2 i → O(N2)

Best case: pivot is the medianT (N) = 2T (N/2) + cNT (N) = cNlogN + N → O(NlogN)

CMPE 250 Sorting Algorithms October 18, 2017 46 / 74

Quicksort: Average case

Assume each of the sizes for S1 are equally likely.

0 ≤ |S1| ≤ N − 1.

T (N) =(

1N

∑N−1i=0 [T (i) + T (N − i − 1)]

)+ cN

T (N) =(

2N

∑N−1i=0 T (i)

)+ cN

NT (N) =(

2∑N−1

i=0 T (i))

+ cN2

(N − 1)T (N − 1) =(

2∑N−2

i=0 T (i))

+ c(N − 1)2

NT (N)− (N − 1)T (N − 1) = 2T (N − 1) + 2cN − cNT (N) = (N + 1)T (N − 1) + 2cNDivide equation by N(N + 1)T (N)N+1 = T (N−1)

N + 2cN+1

CMPE 250 Sorting Algorithms October 18, 2017 47 / 74

Quicksort: Average case (Cont.)

T (N−1)N = T (N−2)

N−1 + 2cN

T (N−2)N−1 = T (N−3)

N−2 + 2cN−1

. . .T (2)

3 = T (1)2 + 2c

3T (N)N+1 = T (1)

2 + 2c∑N+1

i=31i

2c∑N+1

i=31i = 2c(HN+1 − 3

2 )

T (N) = (N + 1)( T (1)2 + 2c(HN+1 − 3

2 )HN ≈ loge(N) + γ + 1

2N (γ = 0.577215664901(Euler-MascheroniConstant)T (N) ≈ (N + 1)

[T (1)

2 + 2c(

(loge(N + 1) + γ + 12(N+1))− 3

2

)]T (N)→ O(NlogN)

CMPE 250 Sorting Algorithms October 18, 2017 48 / 74

Heapsort

The priority queue can be used to sort N items byinserting every item into a binary heap andextracting every item by calling deleteMin N times, thus sorting theresult.

An algorithm based on this idea is heapsort.

It is an O(NlogN) worst-case sorting algorithm.

CMPE 250 Sorting Algorithms October 18, 2017 49 / 74

Heapsort

The main problem with this algorithm is that it uses an extra array forthe items exiting the heap.We can avoid this problem as follows:

After each deleteMin, the heap shrinks by 1.Thus the cell that was last in the heap can be used to store the elementthat was just deleted.Using this strategy, after the last deleteMin, the array will contain allelements in decreasing order.

If we want them in increasing order we must use a max heap.

CMPE 250 Sorting Algorithms October 18, 2017 50 / 74

Heapsort Example

Max heap after the buildHeap phase for the input sequence59,36,58,21,41,97,31,16,26,53

CMPE 250 Sorting Algorithms October 18, 2017 51 / 74

Heapsort Example (Cont.)

Heap after the first deleteMax operation

CMPE 250 Sorting Algorithms October 18, 2017 52 / 74

Heapsort Example (Cont.)

Heap after the second deleteMax operation

CMPE 250 Sorting Algorithms October 18, 2017 53 / 74

Implementation

In the implementation of heapsort, the ADT BinaryHeap is not used.Everything is done in an array.

The root is stored in position 0.Thus there are some minor changes in the code:

Since we use max heap, the logic of comparisons is changed from > to<.For a node in position i, the parent is in (i − 1)/2, the left child is in2i + 1 and right child is next to left child.Percolating down needs the current heap size which is lowered by 1 atevery deletion.

CMPE 250 Sorting Algorithms October 18, 2017 54 / 74

The Heapsort Sort Algorithm

// Standard heapsort.

template <typename Comparable>void heapsort( vector<Comparable> & a )

for( int i = a.size( ) / 2 - 1; i >= 0; --i ) // buildHeappercDown( a, i, a.size( ) );

for( int j = a.size( ) - 1; j > 0; --j )

std::swap( a[ 0 ], a[ j ] ); // deleteMaxpercDown( a, 0, j );

CMPE 250 Sorting Algorithms October 18, 2017 55 / 74

percDown Algorithm

// Internal method for heapsort.// i is the index of an item in the heap.// Returns the index of the left child.

inline int leftChild( int i )

return 2 * i + 1;// Internal method for heapsort that is used in// deleteMax and buildHeap.// i is the position from which to percolate down.// n is the logical size of the binary heap.

template <typename Comparable>void percDown( vector<Comparable> & a, int i, int n )

int child;Comparable tmp;

for( tmp = std::move( a[ i ] ); leftChild( i ) < n; i = child )

child = leftChild( i );if( child != n - 1 && a[ child ] < a[ child + 1 ] )

++child;if( tmp < a[ child ] )

a[ i ] = std::move( a[ child ] );else

break;a[ i ] = std::move( tmp );

CMPE 250 Sorting Algorithms October 18, 2017 56 / 74

Analysis of Heapsort

It is an O(NlogN) algorithm.First phase: Build heap O(N)Second phase: N deleteMax operations: O(NlogN).

Detailed analysis shows that, the average case for heapsort is poorerthan quick sort.

Quicksort’s worst case however is far worse.

An average case analysis of heapsort is very complicated, butempirical studies show that there is little difference between theaverage and worst cases.

Heapsort usually takes about twice as long as quicksort.Heapsort therefore should be regarded as something of an insurancepolicy:On average, it is more costly, but it avoids the possibility of O(N2).

CMPE 250 Sorting Algorithms October 18, 2017 57 / 74

How fast can we sort?

Heapsort, Mergesort, and Quicksort all run in O(NlogN) best caserunning time.

Can we do any better?

CMPE 250 Sorting Algorithms October 18, 2017 58 / 74

The Answer is No! (if using comparisons only)

Our basic assumption: we can only compare two elements at a time –how does this limit the run time?Suppose you are given N elements

Assume no duplicates – any sorting algorithm must also work for thiscase

How many possible orderings can you get?

CMPE 250 Sorting Algorithms October 18, 2017 59 / 74

How many possible orderings?

Example: a, b, c (N = 3)Orderings:

1 a b c2 b c a3 c a b4 a c b5 b a c6 c b a

6 orderings = 3× 2× 1 = 3!

For N elements: N! orderings

CMPE 250 Sorting Algorithms October 18, 2017 60 / 74

A Decision Tree

Leaves contain possible orderings of a, b, c

CMPE 250 Sorting Algorithms October 18, 2017 61 / 74

Decision Trees and Sorting

A Decision Tree is a Binary Tree such that:Each node = a set of orderingsEach edge = 1 comparisonEach leaf = 1 unique orderingHow many leaves for N distinct elements?

Only 1 leaf has sorted orderingEach sorting algorithm corresponds to a decision tree

Finds correct leaf by following edges (= comparisons)

Run time ≥ maximum number of comparisonsDepends on: depth of decision treeWhat is the depth of a decision tree for N distinct elements?

CMPE 250 Sorting Algorithms October 18, 2017 62 / 74

Lower Bound on Comparison-Based Sorting

Suppose you have a binary tree of depth d. How many leaves can thetree have?

e.g. depth d = 1→ at most 2 leaves,d = 2→ at most 4 leaves, etc.

CMPE 250 Sorting Algorithms October 18, 2017 63 / 74

Lower Bound on Comparison-Based Sorting

A binary tree of depth d has at most 2d leaves

Number of leaves L ≤ 2d → d ≥ logL

Decision tree has L = N! leaves→ its depth d ≥ log(N!)

CMPE 250 Sorting Algorithms October 18, 2017 64 / 74

Lower Bound on Comparison-Based Sorting

Stirling’s approximation: N! ≈√

2πN(N

e

)N

log(N!) ≈ log(√

2πN(N

e

)N)

= log(√

2πN)

+ log((N

e

)N)

= 12 log(2πN) + N(log(N)− 1)→ Ω(NlogN)

Conclusion: Any sorting algorithm based on comparisons betweenelements requires Ω(NlogN) comparisons

CMPE 250 Sorting Algorithms October 18, 2017 65 / 74

Comparison of Sorting Algorithms

Algorithm Worst case Average case

Selection sort O(N2) O(N2)Bubble sort O(N2) O(N2)Insertion sort O(N2) O(N2)Mergesort O(NlogN) O(NlogN)Quicksort O(N2) O(NlogN)Radix sort O(N) O(N)Treesort O(N2) O(NlogN)Heapsort O(NlogN) O(NlogN)

CMPE 250 Sorting Algorithms October 18, 2017 66 / 74

Sorting in linear time

Comparison sort:Lower bound: Ω(nlogn).

Non comparison sort:Bucket sort, radix sortThey can sort in linear time (under certain assumptions).

CMPE 250 Sorting Algorithms October 18, 2017 67 / 74

Bucket Sort

Assumption: uniform distributionInput numbers are uniformly distributed in [0, 1).Suppose input size is n.

Idea:Divide [0, 1) into n equal-sized subintervals (buckets).Distribute n numbers into bucketsExpect that each bucket contains few numbers.Sort numbers in each bucket (insertion sort as default).Then go through buckets in order, listing elements.

CMPE 250 Sorting Algorithms October 18, 2017 68 / 74

Bucket Sort Algorithm

Algorithm 2 BucketSort(A)1: n← length[A]2: for i ← 1 to n do

insert A[i] into bucket B[bnA[i]c]3: for i ← 0 to n − 1 do

sort bucket B[i] using insertion sort4: Concatenate bucket B[0],B[1],. . . ,B[n-1]

CMPE 250 Sorting Algorithms October 18, 2017 69 / 74

Bucket Sort

CMPE 250 Sorting Algorithms October 18, 2017 70 / 74

Analysis of Bucket Sort Algorithm

Algorithm 3 BucketSort(A)1: n← length[A] Ω(1)2: for i ← 1 to n do O(n)

insert A[i] into bucket B[bnA[i]c] Ω(1) (i.e. total O(n)

3: for i ← 0 to n − 1 do O(n)sort bucket B[i] using insertion sort O(n2

i )

4: Concatenate bucket B[0],B[1],. . . ,B[n-1] O(n)

where ni is the size of bucket B[i].Thus T (n) = Θ(n) +

∑n−1i=0 O(n2

i ) = Θ(n) + nO(2− 1n ) = Θ(n)

Better than Ω(nlogn)

CMPE 250 Sorting Algorithms October 18, 2017 71 / 74

Radix Sort

Origin: Herman Hollerith’s card-sorting machine for the 1890 U.S.Census

Digit-by-digit sort.

Hollerith’s original (bad) idea: sort on most-significant digit first.

Good idea: Sort on least-significant digit first with auxiliary stable sort.

Stable Sort Property:The relative order of any two items with thesame key is preserved after the execution of the algorithm.

CMPE 250 Sorting Algorithms October 18, 2017 72 / 74

Radix Sort Algorithm

Algorithm 4 RadixSort(A,d)1: for i ← 1 to d do

use stable BucketSort to sort array A on digit i.

Lemma: Given n d-digit numbers in which each digit can take on upto k possible values, RadixSort correctly sorts these numbers inΘ(d(n + k)) time.

If d is constant and k = O(n), then time is Θ(n).

CMPE 250 Sorting Algorithms October 18, 2017 73 / 74

Radix Sort Example

CMPE 250 Sorting Algorithms October 18, 2017 74 / 74