02 Sorting (Chap 13)
-
Upload
frankjamison -
Category
Documents
-
view
22 -
download
2
description
Transcript of 02 Sorting (Chap 13)
Dr. Ahmad R. Hadaegh A.R. Hadaegh National University Page
1
Sorting
Dr. Ahmad R. Hadaegh A.R. Hadaegh National University Page
2
• The efficiency of data handling can often be substantially increased if the data are sorted
• For example, it is practically impossible to find a name in the telephone directory if the items are not sorted
• In order to sort a set of item such as numbers or words, two properties must be considered
• The number of comparisons required to arrange the data• The number of data movement
Dr. Ahmad R. Hadaegh A.R. Hadaegh National University Page
3
• Depending on the sorting algorithm, the exact number of comparisons or exact number of movements may not always be easy to determine
• Therefore, the number of comparisons and movements are approximated with big-O notations
• Some sorting algorithm may do more movement of data than comparison of data
• It is up to the programmer to decide which algorithm is more appropriate for specific set of data
• For example, if only small keys are compared such as integers or characters, then comparison are relatively fast and inexpensive
• But if complex and big objects should be compared, then comparison can be quite costly
Dr. Ahmad R. Hadaegh A.R. Hadaegh National University Page
4
• If on the other hand, the data items moved are large, and the movement is relatively done more, then movement stands out as determining factor rather than comparison
• Further, a simple method may only be 20% less efficient than a more elaborated algorithm
• If sorting is used in a program once in a while and only for small set of data, then using more complicated algorithm may not be desirable
• However, if size of data set is large, 20% can make significant difference and should not be ignored
• Lets look at different sorting algorithms now
Dr. Ahmad R. Hadaegh A.R. Hadaegh National University Page
5
Insertion Sort
• Start with first two element of the array, data[0], and data[1]
• If they are out of order then an interchange takes place
• Next data[2] is considered and placed into its proper position
• If data[2] is smaller than data[0], it is placed before data[0] by shifting down data[0] and data[1] by one position
• Otherwise, if data[2] is between data[0] and data[1], we just need to shift down data [1] and place data[2] in the second position
• Otherwise, data[2] remain as where it is in the array
• Next data[3] is considered and the same process repeats
• And so on
Dr. Ahmad R. Hadaegh A.R. Hadaegh National University Page
6
InsertionSort(data[], n)for (i=1, i<n, i++)
move all elements data[j] greater than data[i] by one position;
place data[i] in its proper position;template <class T>void InsertionSort(T data[ ], int n){ for (int i=1; i<n, i++) { T tmp = data[i]; for (int j = i; j>0 && tmp < data[j-1]; j--) data[j] = data[j-1]; data[j] = tmp }
Algorithm and code for insertion sort
Dr. Ahmad R. Hadaegh A.R. Hadaegh National University Page
7
Example of Insertion Sort
5
2
3
8
1
tmp = 2
5
5
3
8
1
Moving 5 down
2
5
3
8
1
Put tmp=2 in position 1
2
5
3
8
1
2
5
5
8
1
2
3
5
8
1
Moving 5 down Put tmp=3 in
position 2tmp = 3
Dr. Ahmad R. Hadaegh A.R. Hadaegh National University Page
8
tmp = 8 Since 5 is less than 8 no shifting is required
tmp=1 Moving 8 down
Put tmp=1 in position 1
2
3
5
8
1
2
3
5
8
1
2
3
5
8
1
2
3
5
8
8
2
3
5
5
8
Moving 5 down
2
3
3
5
8
Moving 3 down
2
2
3
5
8
Moving 2 down
1
2
3
5
8
Dr. Ahmad R. Hadaegh A.R. Hadaegh National University Page
9
• Advantage of insertion sort:• If the data are already sorted, they remain sorted and
basically no movement is not necessary
• Disadvantage of insertion sort:• An item that is already in its right place may have to be
moved temporary in one iteration and be moved back into its original place
• Complexity of Insertion Sort:• Best case: This happens when the data are already sorted. It
takes O(n) to go through the elements
• Worst case: This happens when the data are in reverse order, then for the ith item (i-1) movement is necessary
Total movement = 1 + 2 + .. . +(n-1) = n(n-1)/2 which is O(n2)
• The average case is approximately half of the worst case which is still O(n2)
Dr. Ahmad R. Hadaegh A.R. Hadaegh National University Page
10
Selection Sort
• Select the minimum in the array and swap it with the first element
• Then select the second minimum in the array and swap it with the second element
• And so on until everything is sorted
Dr. Ahmad R. Hadaegh A.R. Hadaegh National University Page
11
SelectionSort(data[ ],n) for (i=0; i<n-1; i++)
Select the smallest element among data[i] … data[n-1]; Swap it with data[i]
template <class T>void SelectionSort(T data[ ], int n){ int i, j, least; for (i=1; i<n-1, i++) { for (j = i+1; least=i; j<n; j++) if data[j] < data[least]; least = j; swap (data[least], data[i]);}
Algorithm and code for selection sort
Dr. Ahmad R. Hadaegh A.R. Hadaegh National University Page
12
Example of Selection Sort
5
2
3
8
1
The first minimum is searched in the entire array which is 1Swap 1 with the first position
1
2
3
8
5
1
2
3
8
5
The second minimum is 2Swap it with the second position
1
2
3
8
5
Dr. Ahmad R. Hadaegh A.R. Hadaegh National University Page
13
The third minimum is 3Swap 1 with the third position
1
2
3
8
5
The fourth minimum is 5Swap it with the forth position
1
2
3
5
8
1
2
3
8
5
1
2
3
8
5
Dr. Ahmad R. Hadaegh A.R. Hadaegh National University Page
14
Complexity of Selection Sort
• The number of comparison and/or movements is the same in each case (best case, average case and worst case)
• The number of comparison is equal to
Total = (n-1) + (n-2) + (n-3) + …. + 1 = n(n-1)/2 which is O(n2)
Dr. Ahmad R. Hadaegh A.R. Hadaegh National University Page
15
Bubble Sort
• Start from the bottom and move the required elements up (i.e. bubble the elements up)
• Two adjacent elements are interchanged if they are found to be out of order with respect to each other
• First data[n-1] and data[n-2] are compared and swapped if they are not in order
• Then data[n-2] and data[n-3] are swapped if they are not in order
• And so on
Dr. Ahmad R. Hadaegh A.R. Hadaegh National University Page
16
BubbleSort(data[ ],n) for (i=0; i<n-1; i++)
for (j=n-1; j>i; --j) swap elements in position j and j-1 if they are out of order
template <class T>void BubbleSort(T data[ ], int n){ for (int i=0; i<n-1, i++) for (int j = n-1; j>i; --j) if data[j] < data[j-1]; swap (data[j], data[j-1]);}
Algorithm and code for bubble sort
Dr. Ahmad R. Hadaegh A.R. Hadaegh National University Page
17
Example of Bubble Sort
5
2
3
8
1
Iteration 1: Start from the last element up to the first element and bubble the smaller elements up
5
2
3
1
8swap
swap
5
2
1
3
8
swap
5
1
2
3
8
1
5
2
3
8
swap
1
5
2
3
8
1
5
2
3
8no swap
no swap
1
5
2
3
8
swap
1
2
5
3
8
Iteration 2: Start from the last element up to second element and bubble the smaller elements up
Dr. Ahmad R. Hadaegh A.R. Hadaegh National University Page
18
Example of Bubble Sort
Iteration 4: Start from the last element up to fourth element and bubble the smaller elements up
1
2
5
3
8
1
2
5
3
8no swap
swap
1
2
3
5
8
Iteration 3: Start from the last element up to third element and bubble the smaller elements up
1
2
3
5
8no swap
Dr. Ahmad R. Hadaegh A.R. Hadaegh National University Page
19
Complexity of Bubble Sort
• The number of comparison and/or movements is the same in each case (best case, average case and worst case)
• The number of comparison is equal to
Total = (n-1) + (n-2) + (n-3) + …. + 1 = n(n-1)/2 which is O(n2)
Dr. Ahmad R. Hadaegh A.R. Hadaegh National University Page
20
• Comparing the bubble sort with insertion and selection sorts we can say that:
• For the average case, bubble sort makes approximately twice as many comparisons and the same number of moves as insertion sort
• Bubble sort also, on average, makes as many comparison as selection sort and n times more moves than selection sort
• Between theses three types of sorts “Insertion Sort” is generally better algorithm because if array is already sorted running time only takes O(n) which is relatively faster than other algorithms
Dr. Ahmad R. Hadaegh A.R. Hadaegh National University Page
21
Shell Sort
• Shell sort works on the idea that it is easier and faster to sort many short lists than it is to sort one large list
• Select an increment value k (the best value for k is not necessarily clear)
• Sort the sequence consisting of every kth element (use some simple sorting technique)
• Decrement k and repeat above step until k=1
Dr. Ahmad R. Hadaegh A.R. Hadaegh National University Page
22
Example of Shell Sort
4
7
10
2
5
12
1
9
6
3
8
11
Choose k = 4 first
4
7
10
2
5
12
1
9
6
3
8
11
4
3
10
2
5
7
1
9
6
12
8
11
4
3
1
2
5
7
8
9
6
12
10
11
Dr. Ahmad R. Hadaegh A.R. Hadaegh National University Page
23
Example of Shell Sort
Now choose k = 2, and then 1 by applying the insertion sort
4
3
1
2
5
7
8
9
6
12
10
11
1
3
4
2
5
7
6
9
8
12
10
11
1
2
4
3
5
7
6
9
8
11
10
12
1
2
3
4
5
6
7
8
9
10
11
12
Dr. Ahmad R. Hadaegh A.R. Hadaegh National University Page
24
Complexity of shell sort
• Shell sort works well on data that is almost sorted – O (n log2 n)
• Deeper analysis of Shell sort is quite difficult
• Can be shown is practice that it is ~O(n3/2)
ShellSort(data[ ],n) determine numbers ht, ht-1, …..h1 of ways of dividing array data into subarrays for (h = ht ; t>1; t--, h=ht ) divide data into h sub-array for (i=1; i<=h; i++) sort sub-array datai ; sort array data
Algorithm of shell sort
Dr. Ahmad R. Hadaegh A.R. Hadaegh National University Page
25
template <class T>void ShellSort(T data[ ], int arrsize){ int i, j, hCnt, h, k; int increments [20]; // create appropriate number of increments h for (h = 1; i=0; h<arrsize; i++) { increments [i] = h; h = 3*h +1; } // loop on the number of different increments h for (i=i-1; i>=0; i--) { h = increments [i]; // loop on the number of sub-arrays h-sorted in ith pass for (hCnt=h; hCnt<2*h; hCnt++) { // insertion sort for sub-array containing every hth element of array data for (j=hCnt; j<arrsize;) { T tmp = data[j]; k = j; while (k-h>=0; && tmp < data [k-h]) { data[k] = data[k-h]; k = k –h; } data [k] = tmp; j = j + h; } } }}
Code for shell sort
Dr. Ahmad R. Hadaegh A.R. Hadaegh National University Page
26
Heap Sort
• Heap sort uses a heap as described in the earlier lectures
• As we said before, a heap is a binary tree with the following two properties:
• Value of each node is not less than the values stored in each of its children
• The tree is perfectly balanced and the leaves in the level are all in the leftmost positions
Dr. Ahmad R. Hadaegh A.R. Hadaegh National University Page
27
• The procedure is:
• The data are transformed into a heap first
• Doing this, the data are not necessarily sorted; however, we know that the largest element is at the root
• Thus, start with a heap tree,
• Swap the root with the last element
• Restore all elements except the last element into a heap again
• Repeat the process for all elements until you are done
Dr. Ahmad R. Hadaegh A.R. Hadaegh National University Page
28
template <class T>void HeapSort(T data[ ], int size){ for (int i = (size/2)-1; i>=0; i--) MoveDown(data, i, size-1); // creates the heap for (i=size-1; i>=1; --i) { Swap (data[0], data[i]); // move the largest item to data[i] MoveDown(data, 0, i-1); // restores the heap }}
Algorithm and Code for Heap sort
HeapSort(data[ ],n) transform data into a heap for (i=n-1; i>1; i--) swap the root with the element in position i; restore the heap property for the tree data[0] … data[i-1]
Dr. Ahmad R. Hadaegh A.R. Hadaegh National University Page
29
Example of Heap Sort
We first transform the data into heap
2
8
6
1
10
15
3
12
11
2
101
68
1112
315
The initial tree is formed as follows
Dr. Ahmad R. Hadaegh A.R. Hadaegh National University Page
30
2
1012
68
111
315
We turn the array into a heap first
2
101
68
1112
315
2
1012
158
111
36
2
1012
68
111
315
Dr. Ahmad R. Hadaegh A.R. Hadaegh National University Page
31
2
108
1512
111
36
2
1012
158
111
36
2
1011
1512
81
36
2
108
1512
111
36
Dr. Ahmad R. Hadaegh A.R. Hadaegh National University Page
32
15
1011
612
81
32
15
1011
212
81
36
15
1011
212
81
36
2
1011
1512
81
36
Dr. Ahmad R. Hadaegh A.R. Hadaegh National University Page
33
Now we start to sort the elements
15
1011
612
81
32
8
1011
612
151
32
12
108
611
151
32
Swap the root with the last
element
Restore the heap
Dr. Ahmad R. Hadaegh A.R. Hadaegh National University Page
34
Swap the root with the last
element
Restore the heap
12
108
611
151
32
1
108
611
1512
32
11
18
610
1512
32
Dr. Ahmad R. Hadaegh A.R. Hadaegh National University Page
35
Swap the root with the last
element
Restore the heap
3
18
610
1512
112
10
13
68
1512
112
11
18
610
1512
32
Dr. Ahmad R. Hadaegh A.R. Hadaegh National University Page
36
Swap the root with the last
element
Restore the heap
8
12
63
1512
1110
10
13
68
1512
112
2
13
68
1512
1110
Dr. Ahmad R. Hadaegh A.R. Hadaegh National University Page
37
Swap the root with the last
element
Restore the heap
8
12
63
1512
1110
1
82
63
1512
1110
6
82
13
1512
1110
Dr. Ahmad R. Hadaegh A.R. Hadaegh National University Page
38
Swap the root with the last
element
Restore the heap
6
82
13
1512
1110
2
86
13
1512
1110
3
86
12
1512
1110
Dr. Ahmad R. Hadaegh A.R. Hadaegh National University Page
39
Swap the root with the last
element
Restore the heap
3
86
12
1512
1110
1
86
32
1512
1110
2
86
31
1512
1110
Dr. Ahmad R. Hadaegh A.R. Hadaegh National University Page
40
Swap the root with the last
element
Restore the heap
2
86
31
1512
1110
1
86
32
1512
1110
1
86
32
1512
1110
Dr. Ahmad R. Hadaegh A.R. Hadaegh National University Page
41
Place the elements into array using breadth first traversal
1
86
32
1512
1110
1
2
3
6
8
10
11
12
15
Dr. Ahmad R. Hadaegh A.R. Hadaegh National University Page
42
Complexity of heap sort
• The heap sort requires a lot of movement which can be inefficient for large objects
• In the second phase when we start to sort the elements while keeping the heap, we exchange “n-1” times the root with the element in position i and also restore the heap “n-1” times which takes O(nlogn)
• In general:
• The first phase, where we turn the array into heap, requires O(n) steps
• And the second phase when we start to sort the elements requires
• O(n-1) swap + O(nlogn) operations to restore the heap
Total = O(n) + O(nlogn) + O(n-1) = O(nlogn)
Dr. Ahmad R. Hadaegh A.R. Hadaegh National University Page
43
Quick Sort
• This is known to be the best sorting method.
• In this scheme:• One of the elements in the array is chosen as pivot• Then the array is divided into sub-arrays• The elements smaller than the pivot goes into one sub-array• The elements bigger than the pivot goes into another sub-
array• The pivot goes in the middle of these two sub-arrays• Then each sub-array is partitioned the same way as the
original array and process repeats recursively
Dr. Ahmad R. Hadaegh A.R. Hadaegh National University Page
44
Algorithm of quick sortQuickSort(array[ ])
if length (array) > 1 choose a pivot; // partition array into array1 and array2 while there are elements left in array include elements either in array1 // if element <= pivot or in array2 // if element >= pivot QuickSort(array1); QuickSort(array2);
Complexity of quick sort
• The best case is when the arrays are always partitioned equally
• For the best case, the running time is O(nlogn)
• The running time for the average case is also O(nlogn)
• The worst case happens if pivot is always either the smallest element in the array or largest number in the array.
• In the worst case, the running time moves toward O(n2)
Dr. Ahmad R. Hadaegh A.R. Hadaegh National University Page
45
template <class T>void quicksort(T data[ ], int first, int last){ int lower = first +1; upper = last; swap (data[first], data[(first+last)/2)]); T pivot = data [first] while (lower <= upper) { while (data[lower] < pivot) lower++; while (pivot < data[upper]) upper--; if (lower < upper) swap(data[lower++], data[upper--]); else lower++; // can cause to eventually jump out of the while loop } swap (data[upper], data[first]); if (first < upper-1) quicksort(data, first, upper-1); if (upper+1 < last) quicksort(data, upper+1, last)}
Code for quick sort
template<class T>void quicksort(T data[ ], int n){ if (n<2) return; for (int i=1, max=0; i<n; i++) if (data[max] < data[i]) max = i; swap(data[n-1], data[max]); quicksort(data, 0, n-2);}
Dr. Ahmad R. Hadaegh A.R. Hadaegh National University Page
46
Example of Quick Sort
• By example
• Select pivot
• Partition
8131 75
13
43
57
92
65260
65
0 1326 43
3157
6592
8175
Dr. Ahmad R. Hadaegh A.R. Hadaegh National University Page
47
• Recursively apply quicksort to both partitions
• Result will ultimately be a sorted array
0 13 26 31 43 57 65 75 81 92
0 1326 43
31 57
9281
7565
Dr. Ahmad R. Hadaegh A.R. Hadaegh National University Page
48
Radix Sort
• Radix refers to the base of the number. For example radix for decimal numbers is 10 or for hex numbers is 16 or for English alphabets is 26.
• Radix sort has been called the bin sort in the past
• The name bin sort comes from mechanical devices that were used to sort keypunched cards
• Cards would be directed into bins and returned to the deck in a new order and then redirected into bins again
• For integer data, the repeated passes of a radix sort focus on the ones place value, then on the tens place value, then on the thousands place value, etc
• For character based data, focus would be placed on the right-most character, then the second most right-character, etc
Dr. Ahmad R. Hadaegh A.R. Hadaegh National University Page
49
Algorithm and Code for Radix SortAssuming the numbers to be sorted are all decimal integers
RadixSort(array[ ]) for (d = 1; d <= the position of the leftmost digit of longest number; i++) distribute all numbers among piles 0 through 9 according to the dth digit Put all integers on one list
void radixsort(long data[ ], int n){ int i, j, k, mask = 1; const int radix = 10; // because digits go from 0 to 9 const int digits = 10; Queue<long> queues[radix]; for (i=0, factor = 1, i < digits; factor = factor*radix, i++) { for (j=0; j<n; j++) queues [(data[j] / factor ) % radix ].enqueue (data[j]); for (j=k=0; j < radix; j++) while (!queues[j].empty()) data[k++] = queues[j].dequeue(); }}
Dr. Ahmad R. Hadaegh A.R. Hadaegh National University Page
50
Example of Radix Sort
• Assume the data are:459 254 472 534 649 239 432 654 477
• Radix sort will arrange the values into 10 bins based upon the ones place value
012 472 43234 254 534 654567 47789 459 649 239
Dr. Ahmad R. Hadaegh A.R. Hadaegh National University Page
51
• The sublists are collected and made into one large bin (in order given)
472 432 254 534 654 477 459 649 239• Then Radix sort will arrange the values into 10 bins based upon
the tens place value
0123 432 534 2394 6495 254 654 45967 472 47789
Dr. Ahmad R. Hadaegh A.R. Hadaegh National University Page
52
• The sublists are collected and made into one large bin (in order given)
432 534 239 649 254 654 459 472 477• Radix sort will arrange the values into 10 bins based upon the
hundreds place value (done!)
012 239 25434 432 459 472
4775 5346 649 654789
• The sublists are collected and the numbers are sorted
239 254 432 459 472 477 534 649 654
Dr. Ahmad R. Hadaegh A.R. Hadaegh National University Page
53
Another Example of Radix Sort
• Assume the data are:9 54 472 534 39 43 654 77
• To make it simple, rewrite the numbers to make them all three digits like:
009 054 472 534 039 043 654 077• Radix sort will arrange the values into 10 bins based upon the ones
place value
012 472 3 0434 054 534 654567 07789 009 039
Dr. Ahmad R. Hadaegh A.R. Hadaegh National University Page
54
• The sublists are collected and made into one large bin (in order given)
472 043 054 534 654 077 009 039• Then Radix sort will arrange the values into 10 bins based upon
the tens place value
0 009123 534 0394 0435 054 65467 472 07789
Dr. Ahmad R. Hadaegh A.R. Hadaegh National University Page
55
• The sublists are collected and made into one large bin (in order given)
009 534 039 043 054 654 472 077• Radix sort will arrange the values into 10 bins based upon the
hundreds place value (done!)
• The sublists are collected and the numbers are sorted
009 039 043 054 077 472 534 654
0 009 039 043 054 077
1234 472 5 5346 654789
Dr. Ahmad R. Hadaegh A.R. Hadaegh National University Page
56
• Assume the data are:area Book Close Team New Place Print
• To sort the above elements using the radix sort you need to have 26 buckets, one for each character.
• You also need one more character to represent space which has the lowest value. Suppose that letter is question-mark “?” and it is used to represent space
• You can rewrite the data as follows:area? Book? Close Team? New?? Place Print
• Now all letters have 5 characters and it is easy to compare them with each other
• To do the sorting, you can start from the right most character, place the data into appropriate buckets and collect them. Then place them into bucket based on the second right most character and collect them again and so on.
Dr. Ahmad R. Hadaegh A.R. Hadaegh National University Page
57
Complexity of Radix Sort
• The complexity is O(n)
• However, keysize (for example, the maximum number of digits) is a factor, but will still be a linear relationship because for example for at most 3 digits 3n is still O(n) which is linear
• Although theoretically O(n) is an impressive running time for sort, it does not include the queue implementation
• Further, if radix r (the base) is a large number and a large amount of data has to be sorted, then radix sort algorithm requires r queues of at most size n and the number r*n is O(rn) which can be substantially large depending of the size of r.