1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING.
-
Upload
barry-walker -
Category
Documents
-
view
242 -
download
0
Transcript of 1 CSC 2053 1 MERGESORT –Radix and Bin Sort - Csc 2053 SORTING.
1CSC 20531
MERGESORT –Radix and Bin Sort -
Csc 2053 SORTING
2CSC 2053
Stable vs. Non-Stable Sorts
We frequently use sorting methods for items with multiple keys
Sometimes we need to apply the sorting with different keys
– For instance we want to sort a list of people based on last name and then on age
So Black age 30 should appear before Jones age 30
3CSC 2053
Stable vs. Non-Stable Sorts
If we sort a list based on the first key (name) and then apply a sort based on the second key (age) how can we guarantee that the list is still ordered based on the first key?
Definition:
A sorting method is said the be stable if it preserves the relative order of duplicated keys on the list
4CSC 2053
An Example of a Stable Sortadapted from Algorithms byR. Sedgewick
Adams (30)
Washington (23)
Wilson (50)
Black (23)
Brown (40)
Smith (30)
Thompson (40)
Jackson (23)
White (50)
Jones (50)
Adams (30)
Washington (23)
Wilson (50)
Black (23)
Brown (40)
Smith (30)
Thompson (40)
Jackson (23)
White (50)
Jones (50)
Adams (30)
Black (23)
Brown (40)
Jackson (23)
Jones (50)
Smith (30)
Thompson (40)
Washington (23)
White (50)
Wilson (50)
Adams (30)
Black (23)
Brown (40)
Jackson (23)
Jones (50)
Smith (30)
Thompson (40)
Washington (23)
White (50)
Wilson (50)
Black (23)
Jackson (23)
Washington (23)
Adams (30)
Smith (30)
Brown (40)
Thompson (40)
Jones (50)
White (50)
Wilson (50)
Black (23)
Jackson (23)
Washington (23)
Adams (30)
Smith (30)
Brown (40)
Thompson (40)
Jones (50)
White (50)
Wilson (50)
5CSC 2053
Stable vs. Non-Stable Sorts
Mergesort is relatively easy to be made stable
– Just make sure the merge function is stable
Heapsort sorts in O(n log n) but it is not stable
Quicksort is also not stable
Exercise: You should experiment with all the main sorting algorithms to understand which ones are stable and which ones are not.
6CSC 2053
Mergesort
We saw that Quicksort is based on the idea of selecting an element and dividing the list in two halves and then sorting the halves separately
The complementary process which is called merging. – Given two lists which are ordered, combine them into a
larger ordered list
7
MERGESORT
Selection and merging are complementary because
– Selection divides a list into two independent lists
– Merging joins two independent lists into one larger list
Mergesort consists of two recursive calls and one merging procedure
CSC 2053
8CSC 2053
Mergesort
The desirable features of Mergesort
– It performs in O (n log n) in the worst case– It is stable
– It is quite independent of the way the initial list is organized
– Good for linked lists. Can me implemented in such a way that data is accessed sequentially
9CSC 2053
Mergesort
Drawbacks
– It may require an auxiliary array of up to the size of the original list
This can be avoided but the algorithm becomes significantly more complicated making it not worth it
Instead we can use heapsort which is also O(n log n)
10CSC 2053
Understanding the Algorithm
1. Calculate the index of the middle of the list, called it m
2. Use recursion to sort the two partitions [first,m] and [m+1,last]
3. Merge the two ordered lists into one large list
1. Calculate the index of the middle of the list, called it m
2. Use recursion to sort the two partitions [first,m] and [m+1,last]
3. Merge the two ordered lists into one large list
11CSC 2053
Mergesort void mergesort(int list[], int first, int last) { // PRE: list is an array && // the portion to be sorted runs from first to last inclusive if (first >= last) // Nothing to sort return; int m = (first+last)/2; // calculate the middle of the list
// Recursively call the two partitions mergesort(list,first,m); mergesort(list,m+1,last);
merge(list,first,m,last); // merges two sorted lists
// POST: list is sorted in ascending order between the first // and the last}
void mergesort(int list[], int first, int last) { // PRE: list is an array && // the portion to be sorted runs from first to last inclusive if (first >= last) // Nothing to sort return; int m = (first+last)/2; // calculate the middle of the list
// Recursively call the two partitions mergesort(list,first,m); mergesort(list,m+1,last);
merge(list,first,m,last); // merges two sorted lists
// POST: list is sorted in ascending order between the first // and the last}
12CSC 2053
Understanding the merge function
We know that we can easily merge two arrays into a third one Let us try to improve this by using only two arrays.
Given an array where we know that this list is organized in such a way that
from first to m we have half of the array sorted and from m+1 to last we have another half to be sorted, we can
have a merge in-place
– merge(list, first, m, last)
We'll see that an extra array is still required but we save time by not having to create an extra array to hold the two halves.
13CSC 2053
Understanding the merge When the merge function is called we have the following
scenario – a list divided into two sections each in ascending order. Store the two lists in second array as below
To make the algorithm simpler, reverse the second half of the list storing it in an auxiliary list in descending order
list
list
list in ascending orderlist in ascending order list in ascending orderlist in ascending order
firstfirst mm lastlast
aux aux list in ascending orderlist in ascending order list in descending orderlist in descending order
firstfirst mm lastlast
14CSC 2053
Tracing the merge from the second array to the first (last step)
list 2
list 2 44 99 1010 33 44 1111 1212
first(0)first(0)
m(3)m(3)
last(7)last(7)
15CSC 2053
Tracing the mergeCreate and auxiallary array
list 2
list 2 44 99 1010 33 44 1111 1212
first(0)first(0)
m(3)m(3)
last(7)last(7)
Aux Aux
16CSC 2053
Tracing the mergeCreate a variable i to refer what m is referring to (the middle)
list 2
list 2 44 99 1010 33 44 1111 1212
first(0)first(0)
m(3)m(3)
last(7)last(7)
Auxilliaryarray
Auxilliaryarray
i(3)i(3)
17CSC 2053
Tracing the merge Copy the middle of the first array to the middle of the auxiallary array and decrement i.
list 2
list 2 44 99 1010 33 44 1111 1212
first(0)first(0)
m(3)m(3)
last(7)last(7)
aux aux
1010
i(3)i(3)
18CSC 2053
Tracing the merge(last step)
list 2
list 2 44 99 1010 33 44 1111 1212
first(0)first(0)
m(3)m(3)
last(7)last(7)
aux aux 1010
i(2)i(2)
19CSC 2053
Tracing the mergeContinue to copy the first part of the first array to the second array
list 2
list 2 44 99 1010 33 44 1111 1212
first(0)first(0)
m(3)m(3)
last(7)last(7)
aux aux 99 1010
i(1)i(1)
20CSC 2053
Tracing the merge(last step)
list 2
list 2 44 99 1010 33 44 1111 1212
first(0)first(0)
m(3)m(3)
last(7)last(7)
aux aux 99 1010
i(2)i(2)
21CSC 2053
Tracing the merge(last step)
list 2
list 2 44 99 1010 33 44 1111 1212
first(0)first(0)
m(3)m(3)
last(7)last(7)
aux aux 44 99 1010
i(1)i(1)
22CSC 2053
Tracing the merge(last step)
list 2
list 2 44 99 1010 33 44 1111 1212
first(0)first(0)
m(3)m(3)
last(7)last(7)
aux 2 aux 2 44 99 1010
i(1)i(1)
23CSC 2053
Tracing the merge(last step)
list 2
list 2 44 99 1010 33 44 1111 1212
first(0)first(0)
m(3)m(3)
last(7)last(7)
aux 2 aux 2 44 99 1010
i(0)i(0)
24CSC 2053
Tracing the mergeAssign j to last element in the first array
list 2
list 2 44 99 1010 33 44 1111 1212
first(0)first(0)
m(3)m(3)
last(7)last(7)
aux 2 aux 2 44 99 1010
i(0)i(0) j(7)j(7)
25CSC 2053
Tracing the mergeCopy elements from j(12) backwards to m(10) to second array in descending order
list 2
list 2 44 99 1010 33 44 1111 1212
first(0)first(0)
m(3)m(3)
last(7)last(7)
aux 2 aux 2 44 99 1010 1212
i(0)i(0) j(7)j(7)
26CSC 2053
Tracing the merge Copy elements from j(12) backwards to m(10) to second array in descending order
list 2
list 2 44 99 1010 33 44 1111 1212
first(0)first(0)
m(3)m(3)
last(7)last(7)
aux 2 aux 2 44 99 1010 1212
i(0)i(0) j(6)j(6)
27CSC 2053
Tracing the merge(last step)
list 2
list 2 44 99 1010 33 44 1111 1212
first(0)first(0)
m(3)m(3)
last(7)last(7)
aux 2 aux 2 44 99 1010 1212 1111
i(0)i(0) j(6)j(6)
28CSC 2053
Tracing the merge(last step)
list 2
list 2 44 99 1010 33 44 1111 1212
first(0)first(0)
m(3)m(3)
last(7)last(7)
aux 2 aux 2 44 99 1010 1212 1111
i(0)i(0) j(5)j(5)
29CSC 2053
Tracing the merge(last step)
list 2
list 2 44 99 1010 33 44 1111 1212
first(0)first(0)
m(3)m(3)
last(7)last(7)
aux 2 aux 2 44 99 1010 1212 1111 44
i(0)i(0) j(5)j(5)
30CSC 2053
Tracing the merge(last step)
list 2
list 2 44 99 1010 33 44 1111 1212
first(0)first(0)
m(3)m(3)
last(7)last(7)
aux 2 aux 2 44 99 1010 1212 1111 44
i(0)i(0) j(4)j(4)
31CSC 2053
Tracing the merge(Stop when j reaches m(at 10)
list 2
list 2 44 99 1010 33 44 1111 1212
first(0)first(0)
m(3)m(3)
last(7)last(7)
aux 2 aux 2 44 99 1010 1212 1111 44 33
i(0)i(0) j(4)j(4)
32CSC 2053
Tracing the mergeNow merge two parts of the array
list 2
list 2 44 99 1010 33 44 1111 1212
first(0)first(0)
m(3)m(3)
last(7)last(7)
aux 2 aux 2 44 99 1010 1212 1111 44 33
i(0)i(0) j(3)j(3)
33CSC 2053
Tracing the merge(Assign i to index 0 and j to lastindex
list 2
list 2 44 99 1010 33 44 1111 1212
first(0)first(0)
m(3)m(3)
last(7)last(7)
aux 2 aux 2 44 99 1010 1212 1111 44 33
i(0)i(0) j(7)j(7)
k(0)k(0)
34CSC 2053
Tracing the mergeStarting with i, compare i to j(2 to 3)
list 2
list 2 44 99 1010 33 44 1111 1212
first(0)first(0)
m(3)m(3)
last(7)last(7)
aux 2 aux 2 44 99 1010 1212 1111 44 33
i(0)i(0) j(7)j(7)
k(0)k(0)
35CSC 2053
Tracing the mergeStore the smaller of the two in the main list and increment i
list 2 list 2 44 99 1010 33 44 1111 1212
first(0)first(0)
m(3)m(3)
last(7)last(7)
aux 2 aux 2 44 99 1010 1212 1111 44 33
i(1)i(1) j(7)j(7)
k(1)k(1)
36CSC 2053
Tracing the mergeCompare i and j and choose smallest(3) and decrement j
list 2
list 2
33 99 1010 33 44 1111 1212
first(0)first(0)
m(3)m(3)
last(7)last(7)
aux 2
aux 2
44 99 1010 1212 1111 44 33
i(1)i(1) j(7)j(7)
k(1)k(1)
37CSC 2053
Tracing the mergeCompare i and j and pick smallest. The are same so choose the first one - this is a stable sort – and increment i
list 2 list 2 33 99 1010 33 44 1111 1212
first(0)first(0)
m(3)m(3)
last(7)last(7)
aux 2 aux 2 44 99 1010 1212 1111 44 33
i(1)i(1) j(6)j(6)
k(2)k(2)
38CSC 2053
Tracing the merge(last step)
list 2 list 2 33 44 1010 33 44 1111 1212
first(0)first(0)
m(3)m(3)
last(7)last(7)
aux 2 aux 2 44 99 1010 1212 1111 44 33
i(1)i(1) j(6)j(6)
k(2)k(2)
39CSC 2053
Tracing the merge Compare i and j and pick smallest - 4
list 2 list 2 33 44 1010 33 44 1111 1212
first(0)first(0)
m(3)m(3)
last(7)last(7)
aux 2 aux 2 44 99 1010 1212 1111 44 33
i(2)i(2) j(6)j(6)
k(3)k(3)
40CSC 2053
Tracing the merge(last step)
list 2 list 2 33 44 44 33 44 1111 1212
first(0)first(0)
m(3)m(3)
last(7)last(7)
aux 2 aux 2 44 99 1010 1212 1111 44 33
i(2)i(2) j(6)j(6)
k(3)k(3)
41CSC 2053
Tracing the mergedecrement j and Compare i and j and pick smallest
list 2 list 2 33 44 44 33 44 1111 1212
first(0)first(0)
m(3)m(3)
last(7)last(7)
aux 2 aux 2 44 99 1010 1212 1111 44 33
i(2)i(2) j(5)j(5)
k(4)k(4)
42CSC 2053
Tracing the merge(last step)
list 2 list 2 33 44 44 99 44 1111 1212
first(0)first(0)
m(3)m(3)
last(7)last(7)
aux 2 aux 2 44 99 1010 1212 1111 44 33
i(2)i(2) j(5)j(5)
k(4)k(4)
43CSC 2053
Tracing the merge(last step)
list 2 list 2 33 44 44 99 44 1111 1212
first(0)first(0)
m(3)m(3)
last(7)last(7)
aux 2 aux 2 44 99 1010 1212 1111 44 33
i(3)i(3) j(5)j(5)
k(5)k(5)
44CSC 2053
Tracing the merge(last step)
list 2 list 2 33 44 44 99 1010 1111 1212
first(0)first(0)
m(3)m(3)
last(7)last(7)
aux 2 aux 2 44 99 1010 1212 1111 44 33
i(3)i(3) j(5)j(5)
k(5)k(5)
45CSC 2053
Tracing the merge(last step)
list 2 list 2 33 44 44 99 1010 1111 1212
first(0)first(0)
m(3)m(3)
last(7)last(7)
aux 2 aux 2 44 99 1010 1212 1111 44 33
i(4)i(4) j(5)j(5)
k(6)k(6)
46CSC 2053
Tracing the merge(last step)
list 2 list 2 33 44 44 99 1010 1111 1212
first(0)first(0)
m(3)m(3)
last(7)last(7)
aux 2 aux 2 44 99 1010 1212 1111 44 33
i(4)i(4) j(5)j(5)
k(6)k(6)
47CSC 2053
Tracing the merge(last step)
list 2 list 2 33 44 44 99 1010 1111 1212
first(0)first(0)
m(3)m(3)
last(7)last(7)
aux 2 aux 2 44 99 1010 1212 1111 44 33
i(4)i(4) j(4)j(4)
k(7)k(7)
48CSC 2053
Tracing the merge(last step)
list 2 list 2 33 44 44 99 1010 1111 1212
first(0)first(0)
m(3)m(3)
last(7)last(7)
aux 2 aux 2 44 99 1010 1212 1111 44 33
i(4)i(4) j(4)j(4)
k(7)k(7)
49CSC 2053
Tracing the mergei and j crossover – so method ends and main list is sorted in ascending order
list 2 list 2 33 44 44 99 1010 1111 1212
first(0)first(0)
m(3)m(3)
last(7)last(7)
aux 2 aux 2 44 99 1010 1212 1111 44 33
i(4)i(4) j(3)j(3)
k(8)k(8)
50CSC 2053
merge void merge(int list[], int first, int m, int last) { int i, j; int aux[MAXSIZE]; // This copies the mainarray for (i = m+1; i > first;i--) // to the second array aux[i-1] = list[i-1]; for (j = m; j < r; j++) aux[last+m-j] = list[j+1];
// ASSERT: aux list has been prepared with the right half in descending order and left half in ascending order for (int k = 0; k <= last; k++) if (aux[j] < aux[i]) // this re-assembles the list into list[k] = aux[j--]; // sorted order to the first array else list[k] = aux[i++];}
void merge(int list[], int first, int m, int last) { int i, j; int aux[MAXSIZE]; // This copies the mainarray for (i = m+1; i > first;i--) // to the second array aux[i-1] = list[i-1]; for (j = m; j < r; j++) aux[last+m-j] = list[j+1];
// ASSERT: aux list has been prepared with the right half in descending order and left half in ascending order for (int k = 0; k <= last; k++) if (aux[j] < aux[i]) // this re-assembles the list into list[k] = aux[j--]; // sorted order to the first array else list[k] = aux[i++];}
51CSC 2053
On the performance of Mergesort
Unlike quicksort, mergesort guarantees O(n log n) in the worst case
– The reason for this is that quicksort depends on the value of the pivot whereas mergesort divides the list based on the index
Why is it O (n log n)?
– Each merge will require N comparisons – Each time the list is halved– So the standard divide-and-conquer recurrence applies to
mergesort
52CSC 2053
Lecture Sort - Key Points Quicksort
– Use for good overall performance where time is not a constraint
Heap Sort– Slower than quick sort, but guaranteed O(n log n)– Use for real-time systems where time is critical
53CSC 2053
Radixsort In many applications the key used in the ordering is not as
simple as we would expect
– Keys in phone books
– The key in a library catalog
– ZIP codes
So far we have not considered the subtleties of dealing with complex keys
54CSC 2053
Radix Sort
Also in several applications the consideration of the whole key is not necessary
– How many letters of a person name do we compare to find this person in the phone book?
Radix sort algorithms try to gain the same sort of efficiency by decomposing keys into pieces and comparing the pieces
55CSC 2053
Radixsort
So the main idea is to treat numbers as being represented in a base and work with individual digits of the numbers
– We could represent a number in binary and work with the individual bits
– We can consider the number in decimal and work with the individual digits
– We can consider strings as sequence of characters and work with the individual characters
– etc.
56CSC 2053
Radixsort
Radix sort is used by several applications that deal with– Telephone numbers– Social security– ZIP codes
Consider the example with ZIP codes– Letter can be divided into 10 boxes: ZIP codes starting with
0 go to box 0, ZIP codes starting with 1 go to box 1 and so on– After the zip codes are separated in boxes, each of the boxes
can be sorted further considering the second digit
57CSC 2053
An example (MSD)
0
3
2
5
0
1
6
3
3
1
5
2
0
2
2
2
5
0
8
6
6
8
3
4
6
2
1
8
5
3
0
2
5
6
3
4
4
4
9
0
3
1
1
1
2
8
9
0
4
4
7
4
4
2
3
6
7
8
8
4
6
5
8
5
1
58CSC 2053
An example (MSD)
0 2 1 0 4
3 2 8 3 2
2 2 5 1 3
5 5 3 1 6
0 0 0 1 7
1 8 2 2 8
6 6 5 8 8
3 6 6 9 4
3 8 3 0 6
1 3 4 4 5
5 4 4 4 8
2 6 4 7 5
0 2 9 4 1Sorted on first digit
59CSC 2053
An example (MSD)
0 2 1 0 4
3 2 8 3 2
2 2 5 1 3
5 5 3 1 6
0 0 0 1 7
1 8 2 2 8
6 6 5 8 8
3 6 6 9 4
3 8 3 0 6
1 3 4 4 5
5 4 4 4 8
2 6 4 7 5
0 2 9 4 1Sorted on first digit and second digit
60CSC 2053
Types of Radixsort
Because radixsort deals with individual digits of a number we have choices on how to go about comparing the digits
– From left to right– From right to left
The methods that use the first type are called MSD (Most-Significant-Digit) Radix Sorts
The methods that use the second type are called LSD (Less-Significant-Digit) Radix Sorts
MSD sorts are normally more frequently used because the examine the minimum number of data to get the job done
61CSC 2053
Which radix should we use?
This depend on the size of the keys.
Normally for smaller keys a simple extraction of the digits of the keys can do the job
For large keys it may be a better idea to use the binary representation of the key.
– Computers represent data with binary numbers
– Most modern languages allow us to deal with the binary representation of variables
62CSC 2053
Sorting - Bin Sort Assume
– All the keys lie in a small, fixed rangeeg
– integers 0-99– characters ‘A’-’z’, ‘0’-’9’
– There is at most one item with each value of the key Bin sort
Allocate a bin for each value of the keyUsually an entry in an array
For each item, Extract the keyCompute it’s bin numberPlace it in the bin
Finished!
63CSC 2053
Sorting - Bin Sort: Analysis– All the keys lie in a small, fixed range
There are m possible key values– There is at most one item with each value of the key
Bin sort Allocate a bin for each value of the key O(m)
Usually an entry in an array For each item, n times
Extract the key O(1)Compute it’s bin number O(1)Place it in the bin O(1) x n O(n)
Result: O(n) + O(m) = O(n+m) = O(n) if n >> m
Key condition
64CSC 2053
Sorting - Bin Sort: Caveat Key Range
– All the keys lie in a small, fixed rangeThere are m possible key values
– If this condition is not met, eg m >> n,then bin sort is O(m)
Example– Key is a 32-bit integer, m = 232
– Clearly, this isn’t a good way to sort a few thousand integers– Also, we may not have enough space for bins!
Bin sort trades space for speed!– There’s no free lunch!
65CSC 2053
Sorting - Bin Sort with duplicates
– There is at most one item with each value of the key Bin sort
Allocate a bin for each value of the key O(m)Usually an entry in an arrayArray of list heads
For each item, n timesExtract the key O(1)Compute it’s bin number O(1)Add it to a list O(1) x n O(n)Join the lists O(m)Finished! O(n) + O(m) = O(n+m) = O(n) if n >> m
Relax?
66CSC 2053
Sorting - Generalised Bin Sort
Radix sort• Bin sort in phases• Example
• Phase 1 - Sort by least significant digit36 9 0 25 1 49 64 16 81 4
0 1 2 3 4 5 6 7 8 9
0 181
64425 36
16
949
67CSC 2053
Sorting - Generalised Bin Sort Radix sort - Bin sort in phases
• Phase 1 - Sort by least significant digit
• Phase 2 - Sort by most significant digit
0 1 2 3 4 5 6 7 8 9
0 181
64425 36
16
949
0 1 2 3 4 5 6 7 8 9
0
68CSC 2053
Sorting - Generalised Bin Sort Radix sort - Bin sort in phases
• Phase 1 - Sort by least significant digit
• Phase 2 - Sort by most significant digit
0 1 2 3 4 5 6 7 8 9
0 181
64425 36
16
949
0 1 2 3 4 5 6 7 8 9
01
Be careful toadd after anythingin the bin already!
The O bin holds values 0-9
69CSC 2053
Sorting - Generalised Bin Sort Radix sort - Bin sort in phases
• Phase 1 - Sort by least significant digit
• Phase 2 - Sort by most significant digit
0 1 2 3 4 6 7 8 9
0 181
25 3616
949
0 1 2 3 4 5 6 7 8 901
81
5644
70CSC 2053
Sorting - Generalised Bin Sort
Radix sort - Bin sort in phases• Phase 1 - Sort by least significant digit
• Phase 2 - Sort by most significant digit
0 1 2 3 4 5 6 7 8 9
0 181
64425 36
16
949
0 1 2 3 4 5 6 7 8 9
01
8164
71CSC 2053
Sorting - Generalised Bin Sort Radix sort - Bin sort in phases
• Phase 1 - Sort by least significant digit
• Phase 2 - Sort by most significant digit
0 1 2 3 4 5 6 7 8 9
0 181
64425 36
16
949
0 1 2 3 4 5 6 7 8 9
014
8164
72CSC 2053
1 2 3 4 5 6 7 8 9
816425 3616 49
Sorting - Generalised Bin Sort Radix sort - Bin sort in phases
• Phase 1 - Sort by least significant digit
• Phase 2 - Sort by most significant digit
0 1 2 3 4 5 6 7 8 9
0 181
64425 36
16
949
0
0149
Note that the 0bin had to bequite large!
73CSC 2053
1 2 3 4 5 6 7 8 9
816425 3616 49
Sorting - Generalised Bin Sort Radix sort - Bin sort in phases
• Phase 1 - Sort by least significant digit
• Phase 2 - Sort by most significant digit
0 1 2 3 4 5 6 7 8 9
0 181
64425 36
16
949
0
0149
How much space is neededin each phase?
n items
m bins
74CSC 2053
Sorting - Generalised Bin Sort Radix sort - Analysis
• Phase 1 - Sort by least significant digit• Create m binsO(m)• Allocate n items O(n)
• Phase 2 • Create m bins O(m)• Allocate n items O(n)
• Final• Link m bins O(m)
• All steps in sequence, so add• Total O(3m+2n) O(m+n) O(n) for m<<n
75CSC 2053
Sorting - Radix Sort - Analysis Radix sort - General
• Base (or radix) in each phase can be anything suitable• Integers
• Base 10, 16, 100, …• Bases don’t have to be the same
• Still O(n) if n >> si for all i
class date { int day; /* 1 .. 31 */ int month; /* 1 .. 12 */ int year; /* 0 .. 99 */ }
Phase 1 - s1=31 bins
Phase 2 - s2=12 bins
Phase 3 - s3=100 bins
76CSC 2053
Performance of Radixsort
For sorting n records with k number of digits the running time of Radixsort is equivalent to nk = O(n)
– This is because the the algorithms makes k (constant) pass over all n keys
Clearly this performance depend on the sorting used to sort the element based on digit k.
77CSC 2053
Radix Sort - Analysis
Generalised Radix Sort Algorithm
radixsort( A, n ) { for(i=0;i<k;i++) { for(j=0;j<s[i];j++) bin[j] = EMPTY;
for(j=0;j<n;j++) { move A[i] to the end of bin[A[i]->fi] }
for(j=0;j<s[i];j++) concat bin[j] onto the end of A; } }
O( si )
O( n )
O( si )
For each of k radices
78CSC 2053
Radix Sort - Analysis
Generalised Radix Sort Algorithm
radixsort( A, n ) { for(i=0;i<k;i++) { for(j=0;j<s[i];j++) bin[j] = EMPTY;
for(j=0;j<n;j++) { move A[i] to the end of bin[A[i]->fi] }
for(j=0;j<s[i];j++) concat bin[j] onto the end of A; } }
O( si )
O( n )
O( si )
Clear the si bins for the ith radix
79CSC 2053
Radix Sort - Analysis
Generalised Radix Sort Algorithm
radixsort( A, n ) { for(i=0;i<k;i++) { for(j=0;j<s[i];j++) bin[j] = EMPTY;
for(j=0;j<n;j++) { move A[i] to the end of bin[A[i]->fi] }
for(j=0;j<s[i];j++) concat bin[j] onto the end of A; } }
O( si )
O( n )
O( si )Move element A[i]to the end of the bin addressedby the ith field of A[i]
80CSC 2053
Radix Sort - Analysis
Generalised Radix Sort Algorithm
radixsort( A, n ) { for(i=0;i<k;i++) { for(j=0;j<s[i];j++) bin[j] = EMPTY;
for(j=0;j<n;j++) { move A[i] to the end of bin[A[i]->fi] }
for(j=0;j<s[i];j++) concat bin[j] onto the end of A; } }
O( si )
O( n )
O( si )
Concatenate si bins intoone list again
81CSC 2053
Sorting - Better than O(n log n) ? If all we know about the keys is an ordering rule
– No! However,
– If we can compute an address from the key(in constant time) then
– bin sort algorithms can provide better performance
82CSC 2053
Performance of Radixsort
For large values of n the running time of radixsort is comparable to O(n log n)
– If we use binary representation of the keys and we have 1 million 32-bit keys, then k and log n are both about 32. So kn would be comparable to n log n
83CSC 2053
Radix Sort - Analysis
Total
– k iterations, 2si + n for each one
– As long as k is constant
– In general, if the keys are in (0, bk-1)Keys are k-digit base-b numbers si = b for all k
Complexity O(n+kb) = O(n)
84CSC 2053
Radix Sort - Analysis
? Any set of keys can be mapped to (0, bk-1 )
! So we can always obtain O(n) sorting?
If k is constant, yes
85CSC 2053
Radix Sort - Analysis
– But, if k is allowed to increase with n
eg it takes logbn base-b digits to represent n
– Radix sort is no better than quicksort
= O(n log n + 2 log n ) = O(n log n )
86CSC 2053
Radix Sort - Analysis• Radix sort is no better than quicksort
• Another way of looking at this:• We can keep k constant as n increases
if we allow duplicate keys• keys are in (0, bk ), bk < n
• but if the keys must be unique,then k must increase with n
• For O(n) performance, the keys must lie in a restricted range
87CSC 2053
Radix Sort - Realities
Radix sort uses a lot of memory
– n si locations for each phase
– In practice, this makes it difficult to achieve O(n)performance
– Cost of memory management outweighs benefits
88CSC 2053
Lecture 9 - Key Points
Bin Sorts
– If a function exists which can convert the key to an address (ie a small integer)
and the number of addresses (= number of bins) is not too large
then we can obtain O(n) sorting
… but remember it’s actually O(n + m)
– Number of bins, m, must be constant and small
89
Bin or Bucket Sort Analysis
Bucket sorts work well for data sets where the possible key values are known and relatively small and there are on average just a few elements per bucket.
This means the cost of sorting the contents of each bucket can be reduced toward zero.
The ideal result is if the order in each bucket is uninteresting or trivial, for instance, when each bucket holds a single key.
The buckets may be arranged so the concatenation phase is not needed, for instance, the buckets are contiguous parts of an array.
CSC 2053
90CSC 2053
Sorting We now know several sorting algorithms
– Insertion O(n2)– Heap O(n log n) Guaranteed– Quick O(n log n) Most of the time!
Can we do any better?