Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied...
Transcript of Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied...
SortingData Structures and Algorithms (60-254)
2
Sorting
• Sorting is one of the most well-studied problems in Computer Science
• The ultimate reference on the subject is:
“The Art of Computer Programming: Vol. 3Sorting and Searching”,by D. E. Knuth
3
Formal Statement
Given a sequence of n numbers:
a1, a2, …, an
find a permutation of the numbers 1, 2, …, n such that
a(1) a(2) … a(n)
Permutation:3, 2, 1 (1) = 3, (2) = 2, (3) = 1 2, 1, 3 (1) = 2, (2) = 1, (3) = 31, 3, 2 (1) = 1, (2) = 3, (3) = 2…
are all permutations of 1, 2, 3
4
Comparison Sorts
• A comparison sort sorts by comparing elements pairwise.
• We study these comparison sorts:• Insertion Sort• Shellsort• Mergesort• Quicksort
5
Insertion Sort
Sort the sequence3, 1, 5, 4, 2
Sort 3 3Sort 3, 1 1, 3Sort 1, 3, 5 1, 3, 5Sort 1, 3, 5, 4 1, 3, 4, 5Sort 1, 3, 4, 5, 2 1, 2, 3, 4, 5
6
Incremental sorting
In general, at the ith step,
a1, a2, a3, …, ai-1, ai
are already sorted
a(1) a(2) … a(i)
for some permutation of 1, 2, …, i.
In the next step, ai+1 has to be inserted in the correct position
7
Analysis of Insertion Sort
What is worst-case input?Elements in decreasing order!! Example:
5, 4, 3, 2, 1# of comparisons
5 05, 4 4, 5 14, 5, 3 3, 4, 5 23, 4, 5, 2 2, 3, 4, 5 32, 3, 4, 5, 1 1, 2, 3, 4, 5 4
8
Worst case
In general, to insert ai+1 in its proper place,w.r.t. the sorted preceeding i numbers
a1, a2, …, ai, we can makei comparisons in the worst case.
Thus,
9
Shellsort
Due to Donald ShellFor example:
Shellsort the sequence:81, 94, 11, 96, 12, 35, 17, 95 (1)
Step 1: Sort all sequences that are four positions apart.
81, 12 12, 8194, 35 35, 9411, 17 11, 1796, 95 95, 96
Results in:12, 35, 11, 95, 81, 94, 17, 96 (2)
10
Shellsort
Step 2: 12, 35, 11, 95, 81, 94, 17, 96 (2)
Sort all sequences of (2) that are two positions apart.12, 11, 81, 17 11, 12, 17, 8135, 95, 94, 96 35, 94, 95, 96
Results in:11, 35, 12, 94, 17, 95, 81, 96 (3)
Step 3:
Sort all sequences of (3) that are one position apart.11, 35, 12, 94, 17, 95, 81, 96 11, 12, 17, 35, 81, 94, 95, 96 (4)
Sequence (4) is sorted !!
11
Observations
h1, h2, h3 = 4, 2, 1 is called a gap sequence.Different gap sequences are possibleEvery one of them must end with 1Shell’s gap sequence:
h1 = n/2
hi = hi-1 / 2 (downto hk = 1)
All sequences were sorted using insertion sortIn Step 3, we sorted the entire sequence, using insertion sort!
Advantage over straightforward insertion sort?
12
Example
Insertion sort on:81, 94, 11, 96, 12, 35, 17, 9581 081, 94 111, 81, 94 211, 81, 94, 96 111, 12, 81, 94, 96 411, 12, 35, 81, 94, 96 411, 12, 17, 35, 81, 94, 96 511, 12, 17, 35, 1, 94, 95, 96 2
__Total # of comparisons 19
13
Example
Insertion sort on:11, 35, 12, 94, 17, 95, 81, 9611 011, 35 111, 12, 35 211, 12, 35, 94 111, 12, 17, 35, 94 311, 12, 17, 35, 94, 95 111, 12, 17, 35, 81, 94, 95 311, 12, 17, 35, 81, 94, 95, 96 1
__Total # of comparisons 12
14
Analysis of Shellsort
Clever choice of a gap sequence leads to a subquadratic algorithm That is, for an n-element sequence, the # of comparisons:
when using the Hibbard sequence: 1, 3, 7, …, 2k-1
)()( 2/3nOnT
15
Mergesort
Sort:81, 94, 11, 96, | 12, 35, 17, 95 Mergesort (81, 94, 11, 96)
Mergesort(12, 35, 17, 95)Merge the two sorted lists from above two lines.
This is a Divide-and-conquer algorithm.
16
Divide
MS (81, 94, 11, 96, 12, 35, 17, 95)
MS (11, 96)
MS (12, 35, 17, 95)
MS (81, 94)
MS (81, 94, 11, 96)
MS (12, 35) MS (17, 95)
17
Conquer
Merge two sorted lists.MS (81, 94, 11, 96) = 11, 81, 94, 96 (1)MS (12, 35, 17, 95) = 12, 17, 35, 95 (2)
Compare 11 and 12Output 11Move index in list (1)Compare 12 and 81Output 12Move index in list (2)Compare 17 and 81Output 17Move index in list (2)
18
Number of Comparisons
A total of seven comparisons to generate the sorted list11, 12, 17, 35, 81, 94, 95, 96
This is the maximum!For if the lists were
81, 94, 95, 96 and11, 12, 17, 35
We would need only four comparisonsThe algorithm follows…
19
Procedure Mergesort(A)
n size of Aif (n > 1)
Set A1 A[1 ... n/2] // Create a new array A1
Set A2 A[n/2+1 ... n] // Create a new array A2
Mergesort(A1)
Mergesort(A2)
Merge(A, A1, A2)else
// A has only one element do nothing!
20
Procedure Merge(A, A1, A2)
n1 size of A1
n2 size of A2
i 1; j 1; k 1
while (i <= n1 and j <= n2)
if (A1[i] < A2[j])
A[k] A1[i]; i i +1else
A[k] A2[j]; j j + 1k k + 1
for m i to n1
A[k] A1[m]; k k + 1
for m j to n2
A[k] A2[m]; k k + 1
21
Theorem
To merge two sorted lists, each of length n, we need at most 2n – 1 comparisons
22
Complexity of Mergesort
T(n) = 2 T(n/2) + O(n) n > 2= 1 n = 1
Solution: O(n log n)
23
A Partitioning Game
Given L = 5, 3, 2, 6, 4, 1, 3, 7 Partition L
into L1 and L2 such that
every element in list L1
is less than or equal to
every element in list L2
How?
24
Split
a = first element of LMake
Every element of list L1
less than or equal toa
less than or equal to
every element of list L2
How?Using two indices
lx = left indexrx = right index
25
rx
Initial configuration
5, 3, 2, 6, 4, 1, 3, 7
lx
Rules: lx moves right until it meets an element 5 rx moves left until it meets an element 5 exchange elements and continue until indices meet or cross.
26
Intermediate configurations
5, 3, 2, 6, 4, 1, 3, 7
lx rxExchange and continue: 3, 3, 2, 6, 4, 1, 5, 7
Exchange and continue: 3, 3, 2, 1, 4, 6, 5, 7
lx and rx have crossed !!
27
Intermediate configurations
L1 = 3, 3, 2, 1, 4
L2 = 6, 5, 7
Now, do the same with the lists
L1 and L2
Initial configuration for L1
3, 3, 2, 1, 4
lx rx
3 = first element.
28
Intermediate configuration for L1
3, 3, 2, 1, 4
lx rxExchange and continue:
1, 3, 2, 3, 4
lx rxExchange and continue:
1, 2, 3, 3, 4
rx lx
Left and right indices have crossed!
29
Quicksort
We have new lists
L11 = 1, 2
L12 = 3, 3, 4with which we continue to do the same Partitioning stops once we have a list with only one elementAll this, done in place gives us the following sorted list
Lsorted = 1, 2, 3, 3, 4, 5, 6, 7This is Quicksort!!!
30
Partition – Formal Description
Procedure Partition(L, p, q)a L[p]lx p – 1rx q + 1while true
repeat rx rx -1 // Move right indexuntil L[rx] arepeat lx lx + 1 // Move left indexuntil L[lx] aif (lx < rx)
exchange(L[lx], L[rx])else
return rx // Indices have crossed
31
Quicksort
Procedure Quicksort(L, p, q)if (p < q)
r Partition(L, p, q)Quicksort(L, p, r)Quicksort(L, r+1, q)
To sort the entire array, the initial call is:
Quicksort(L, 1, n)where n is the length of L
32
Observations
• Choice of the partitioning element a is important
• Determines how the lists are split
• Desirable: To split the list evenly
• How?...
33
Undesirable Partitioning
List of size 2
List of size n-1
List of size n
34
Example
Such an undesirable partitioningis possible if we takethe following sorted sequence
3, 4, 5, 6, 7, 8, 9, 10
and we partition as described above
35
Desirable Partitioning
.
.
.
36
Choosing the pivot
Steering between the two extremes:Can we choose the partitioning element to steer between the two extremes?Heuristics:
Median-of-three Find median of first, middle and last element
orFind median of three randomly chosen elements.
37
Analysis
Worst-case behaviorT(n) = n + n-1 + … + 2 = O(n2)
Since to partition a list of size n-i (i 0) into two lists of size 1 and n-i-1we need to look at all n-i elements
n-i
1 n-i-1
T(n) = T(n-1) + O(n)
38
Best-case Behavior
T(n) = 2T(n/2) + O(n) T(n) = O(n log n) where T(n) = time to partition a list of
size n into two lists of size n/2 each.
39
Average-case Behavior
Tavg(n) = O(n log n) T(n) = T(n) + T(n) + O(n)
where + = 1
40
Sorting – in Linear Time??
Yes… but…only under certain assumptions on the input data
Linear-time sorting techniques:• Counting sort• Radix sort• Bucket sort
41
Counting Sort
Assumption:Input elements are integers in the range 1 to kwhere
k = O(n) Example:
Sort the listL = 3, 6, 4, 1, 3, 4, 1, 4
using counting sort.
42
Example
A 3 6 4 1 3 4 1 4 1 2 3 4 5 6 7 8
C 1 2 3 4 5 6
B 1 2 3 4 5 6 7 8
Input is in array A C[i] counts # of times i occurs in the input at first Sorted array is stored in B
43
Example - Continued
Count # of times i occurs in A:
C 2 0 2 3 0 1 1 2 3 4 5 6
Cumulative counters:
C 2 2 4 7 7 8 1 2 3 4 5 6
Now C[i] contains the count of the number of elementsin the input that are i
44
Example - Continued
• Go through array A• First element is 3• From C[3] we know that 4 elements are to 3• So B[4] = 3• Decrease C[3] by 1 B 3
1 2 3 4 5 6 7 8
C 2 2 3 7 7 8 1 2 3 4 5 6
45
Example - Continued
• Next element in A is 6• C[6] = 8 eight elements are 6• B[8] = 6• Decrease C[6] by 1 B 3 6
1 2 3 4 5 6 7 8
C 2 2 3 7 7 7 1 2 3 4 5 6
and so on…
46
Example - Continued
A[3] = 4, C[4] = 7, B[7] = 4
B 3 4 6 1 2 3 4 5 6 7 8
C 2 2 3 6 7 7 1 2 3 4 5 6
47
Example - Continued
A[4] = 1, C[1] = 2, B[2] = 1
B 1 3 4 6 1 2 3 4 5 6 7 8
C 1 2 3 6 7 7 1 2 3 4 5 6
48
Example - Continued
A[5] = 3, C[3] = 3, B[3] = 3
B 1 3 3 4 6 1 2 3 4 5 6 7 8
C 2 2 2 6 7 7 1 2 3 4 5 6
49
Example - Continued
A[6] = 4, C[4] = 6, B[6] = 4
B 1 3 3 4 4 6 1 2 3 4 5 6 7 8
C 1 2 2 5 7 7 1 2 3 4 5 6
50
Example - Continued
A[7] = 1, C[1] = 1, B[1] = 1
B 1 1 3 3 4 4 6 1 2 3 4 5 6 7 8
C 0 2 2 5 7 7 1 2 3 4 5 6
51
Example - Continued
A[8] = 4, C[4] = 5, B[5] = 4
B 1 1 3 3 4 4 4 6 1 2 3 4 5 6 7 8
C 0 2 2 4 7 7 1 2 3 4 5 6
52
Formal Algorithm
Procedure CountingSort(A, B, k, n)for i 1 to k
C[i] 0for i 1 to n
C[A[i]] C[A[i]] + 1// C[i] now contains a counter of how often i occursfor i 2 to k
C[i] C[i] + C[i-1]// C[i] now contains # of elements ifor i n downto 1
B[C[A[i]]] A[i]C[A[i]] C[A[i]] - 1
53
Without using a second array B…Procedure Single_Array_CountingSort(A, k, n)
for i 1 to kC[i] 0
for i 1 to nC[A[i]] C[A[i]] + 1
// C[i] now contains a counter of how often i occurspos 1for i 1 to k
for j 1 to C[i]A[pos] ipos pos + 1
54
Analysis of Single-Array Counting SortFirst and second for loops take k = O(n) stepsThen, two nested for loops… O(n2) ??? A more accurate upper bound??...Yes…For each i … inner for loop executes C[i] timesThen, two for loops execute
k
iiC
1][
Theorem: niCk
i 1
][
Proof (sketch):
Second for loop executed n times.
Each step an element in C is increased by 1. q.e.d.
55
Discussion
Complexity:If the list is of size n and k = O(n), then T(n) = O(n)
Stability:
A sorting method is stable ifequal elements are output
in the same order they had in the input. Theorem:
Counting Sort is stable.
56
Lower Bounds on Sorting
Theorem:For any comparison sort of n elements
T(n) = (n log n) Remark:
T(n) = (g(n)) means thatT(n) grows at least as fast as g(n)