Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied...

56
Sorting Data Structures and Algorithms (60- 254)

Transcript of Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied...

Page 1: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on.

SortingData Structures and Algorithms (60-254)

Page 2: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on.

2

Sorting

• Sorting is one of the most well-studied problems in Computer Science

• The ultimate reference on the subject is:

“The Art of Computer Programming: Vol. 3Sorting and Searching”,by D. E. Knuth

Page 3: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on.

3

Formal Statement

Given a sequence of n numbers:

a1, a2, …, an

find a permutation of the numbers 1, 2, …, n such that

a(1) a(2) … a(n)

Permutation:3, 2, 1 (1) = 3, (2) = 2, (3) = 1 2, 1, 3 (1) = 2, (2) = 1, (3) = 31, 3, 2 (1) = 1, (2) = 3, (3) = 2…

are all permutations of 1, 2, 3

Page 4: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on.

4

Comparison Sorts

• A comparison sort sorts by comparing elements pairwise.

• We study these comparison sorts:• Insertion Sort• Shellsort• Mergesort• Quicksort

Page 5: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on.

5

Insertion Sort

Sort the sequence3, 1, 5, 4, 2

Sort 3 3Sort 3, 1 1, 3Sort 1, 3, 5 1, 3, 5Sort 1, 3, 5, 4 1, 3, 4, 5Sort 1, 3, 4, 5, 2 1, 2, 3, 4, 5

Page 6: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on.

6

Incremental sorting

In general, at the ith step,

a1, a2, a3, …, ai-1, ai

are already sorted

a(1) a(2) … a(i)

for some permutation of 1, 2, …, i.

In the next step, ai+1 has to be inserted in the correct position

Page 7: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on.

7

Analysis of Insertion Sort

What is worst-case input?Elements in decreasing order!! Example:

5, 4, 3, 2, 1# of comparisons

5 05, 4 4, 5 14, 5, 3 3, 4, 5 23, 4, 5, 2 2, 3, 4, 5 32, 3, 4, 5, 1 1, 2, 3, 4, 5 4

Page 8: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on.

8

Worst case

In general, to insert ai+1 in its proper place,w.r.t. the sorted preceeding i numbers

a1, a2, …, ai, we can makei comparisons in the worst case.

Thus,

Page 9: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on.

9

Shellsort

Due to Donald ShellFor example:

Shellsort the sequence:81, 94, 11, 96, 12, 35, 17, 95 (1)

Step 1: Sort all sequences that are four positions apart.

81, 12 12, 8194, 35 35, 9411, 17 11, 1796, 95 95, 96

Results in:12, 35, 11, 95, 81, 94, 17, 96 (2)

Page 10: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on.

10

Shellsort

Step 2: 12, 35, 11, 95, 81, 94, 17, 96 (2)

Sort all sequences of (2) that are two positions apart.12, 11, 81, 17 11, 12, 17, 8135, 95, 94, 96 35, 94, 95, 96

Results in:11, 35, 12, 94, 17, 95, 81, 96 (3)

Step 3:

Sort all sequences of (3) that are one position apart.11, 35, 12, 94, 17, 95, 81, 96 11, 12, 17, 35, 81, 94, 95, 96 (4)

Sequence (4) is sorted !!

Page 11: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on.

11

Observations

h1, h2, h3 = 4, 2, 1 is called a gap sequence.Different gap sequences are possibleEvery one of them must end with 1Shell’s gap sequence:

h1 = n/2

hi = hi-1 / 2 (downto hk = 1)

All sequences were sorted using insertion sortIn Step 3, we sorted the entire sequence, using insertion sort!

Advantage over straightforward insertion sort?

Page 12: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on.

12

Example

Insertion sort on:81, 94, 11, 96, 12, 35, 17, 9581 081, 94 111, 81, 94 211, 81, 94, 96 111, 12, 81, 94, 96 411, 12, 35, 81, 94, 96 411, 12, 17, 35, 81, 94, 96 511, 12, 17, 35, 1, 94, 95, 96 2

__Total # of comparisons 19

Page 13: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on.

13

Example

Insertion sort on:11, 35, 12, 94, 17, 95, 81, 9611 011, 35 111, 12, 35 211, 12, 35, 94 111, 12, 17, 35, 94 311, 12, 17, 35, 94, 95 111, 12, 17, 35, 81, 94, 95 311, 12, 17, 35, 81, 94, 95, 96 1

__Total # of comparisons 12

Page 14: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on.

14

Analysis of Shellsort

Clever choice of a gap sequence leads to a subquadratic algorithm That is, for an n-element sequence, the # of comparisons:

when using the Hibbard sequence: 1, 3, 7, …, 2k-1

)()( 2/3nOnT

Page 15: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on.

15

Mergesort

Sort:81, 94, 11, 96, | 12, 35, 17, 95 Mergesort (81, 94, 11, 96)

Mergesort(12, 35, 17, 95)Merge the two sorted lists from above two lines.

This is a Divide-and-conquer algorithm.

Page 16: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on.

16

Divide

MS (81, 94, 11, 96, 12, 35, 17, 95)

MS (11, 96)

MS (12, 35, 17, 95)

MS (81, 94)

MS (81, 94, 11, 96)

MS (12, 35) MS (17, 95)

Page 17: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on.

17

Conquer

Merge two sorted lists.MS (81, 94, 11, 96) = 11, 81, 94, 96 (1)MS (12, 35, 17, 95) = 12, 17, 35, 95 (2)

Compare 11 and 12Output 11Move index in list (1)Compare 12 and 81Output 12Move index in list (2)Compare 17 and 81Output 17Move index in list (2)

Page 18: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on.

18

Number of Comparisons

A total of seven comparisons to generate the sorted list11, 12, 17, 35, 81, 94, 95, 96

This is the maximum!For if the lists were

81, 94, 95, 96 and11, 12, 17, 35

We would need only four comparisonsThe algorithm follows…

Page 19: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on.

19

Procedure Mergesort(A)

n size of Aif (n > 1)

Set A1 A[1 ... n/2] // Create a new array A1

Set A2 A[n/2+1 ... n] // Create a new array A2

Mergesort(A1)

Mergesort(A2)

Merge(A, A1, A2)else

// A has only one element do nothing!

Page 20: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on.

20

Procedure Merge(A, A1, A2)

n1 size of A1

n2 size of A2

i 1; j 1; k 1

while (i <= n1 and j <= n2)

if (A1[i] < A2[j])

A[k] A1[i]; i i +1else

A[k] A2[j]; j j + 1k k + 1

for m i to n1

A[k] A1[m]; k k + 1

for m j to n2

A[k] A2[m]; k k + 1

Page 21: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on.

21

Theorem

To merge two sorted lists, each of length n, we need at most 2n – 1 comparisons

Page 22: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on.

22

Complexity of Mergesort

T(n) = 2 T(n/2) + O(n) n > 2= 1 n = 1

Solution: O(n log n)

Page 23: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on.

23

A Partitioning Game

Given L = 5, 3, 2, 6, 4, 1, 3, 7 Partition L

into L1 and L2 such that

every element in list L1

is less than or equal to

every element in list L2

How?

Page 24: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on.

24

Split

a = first element of LMake

Every element of list L1

less than or equal toa

less than or equal to

every element of list L2

How?Using two indices

lx = left indexrx = right index

Page 25: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on.

25

rx

Initial configuration

5, 3, 2, 6, 4, 1, 3, 7

lx

Rules: lx moves right until it meets an element 5 rx moves left until it meets an element 5 exchange elements and continue until indices meet or cross.

Page 26: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on.

26

Intermediate configurations

5, 3, 2, 6, 4, 1, 3, 7

lx rxExchange and continue: 3, 3, 2, 6, 4, 1, 5, 7

Exchange and continue: 3, 3, 2, 1, 4, 6, 5, 7

lx and rx have crossed !!

Page 27: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on.

27

Intermediate configurations

L1 = 3, 3, 2, 1, 4

L2 = 6, 5, 7

Now, do the same with the lists

L1 and L2

Initial configuration for L1

3, 3, 2, 1, 4

lx rx

3 = first element.

Page 28: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on.

28

Intermediate configuration for L1

3, 3, 2, 1, 4

lx rxExchange and continue:

1, 3, 2, 3, 4

lx rxExchange and continue:

1, 2, 3, 3, 4

rx lx

Left and right indices have crossed!

Page 29: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on.

29

Quicksort

We have new lists

L11 = 1, 2

L12 = 3, 3, 4with which we continue to do the same Partitioning stops once we have a list with only one elementAll this, done in place gives us the following sorted list

Lsorted = 1, 2, 3, 3, 4, 5, 6, 7This is Quicksort!!!

Page 30: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on.

30

Partition – Formal Description

Procedure Partition(L, p, q)a L[p]lx p – 1rx q + 1while true

repeat rx rx -1 // Move right indexuntil L[rx] arepeat lx lx + 1 // Move left indexuntil L[lx] aif (lx < rx)

exchange(L[lx], L[rx])else

return rx // Indices have crossed

Page 31: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on.

31

Quicksort

Procedure Quicksort(L, p, q)if (p < q)

r Partition(L, p, q)Quicksort(L, p, r)Quicksort(L, r+1, q)

To sort the entire array, the initial call is:

Quicksort(L, 1, n)where n is the length of L

Page 32: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on.

32

Observations

• Choice of the partitioning element a is important

• Determines how the lists are split

• Desirable: To split the list evenly

• How?...

Page 33: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on.

33

Undesirable Partitioning

List of size 2

List of size n-1

List of size n

Page 34: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on.

34

Example

Such an undesirable partitioningis possible if we takethe following sorted sequence

3, 4, 5, 6, 7, 8, 9, 10

and we partition as described above

Page 35: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on.

35

Desirable Partitioning

.

.

.

Page 36: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on.

36

Choosing the pivot

Steering between the two extremes:Can we choose the partitioning element to steer between the two extremes?Heuristics:

Median-of-three Find median of first, middle and last element

orFind median of three randomly chosen elements.

Page 37: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on.

37

Analysis

Worst-case behaviorT(n) = n + n-1 + … + 2 = O(n2)

Since to partition a list of size n-i (i 0) into two lists of size 1 and n-i-1we need to look at all n-i elements

n-i

1 n-i-1

T(n) = T(n-1) + O(n)

Page 38: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on.

38

Best-case Behavior

T(n) = 2T(n/2) + O(n) T(n) = O(n log n) where T(n) = time to partition a list of

size n into two lists of size n/2 each.

Page 39: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on.

39

Average-case Behavior

Tavg(n) = O(n log n) T(n) = T(n) + T(n) + O(n)

where + = 1

Page 40: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on.

40

Sorting – in Linear Time??

Yes… but…only under certain assumptions on the input data

Linear-time sorting techniques:• Counting sort• Radix sort• Bucket sort

Page 41: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on.

41

Counting Sort

Assumption:Input elements are integers in the range 1 to kwhere

k = O(n) Example:

Sort the listL = 3, 6, 4, 1, 3, 4, 1, 4

using counting sort.

Page 42: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on.

42

Example

A 3 6 4 1 3 4 1 4  1 2 3 4 5 6 7 8

C              1 2 3 4 5 6

B                  1 2 3 4 5 6 7 8

Input is in array A C[i] counts # of times i occurs in the input at first Sorted array is stored in B

Page 43: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on.

43

Example - Continued

Count # of times i occurs in A:

C 2 0 2 3 0 1  1 2 3 4 5 6

Cumulative counters:

C 2 2 4 7 7 8  1 2 3 4 5 6

Now C[i] contains the count of the number of elementsin the input that are i

Page 44: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on.

44

Example - Continued

• Go through array A• First element is 3• From C[3] we know that 4 elements are to 3• So B[4] = 3• Decrease C[3] by 1 B       3        

  1 2 3 4 5 6 7 8

C 2 2 3 7 7 8  1 2 3 4 5 6

Page 45: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on.

45

Example - Continued

• Next element in A is 6• C[6] = 8 eight elements are 6• B[8] = 6• Decrease C[6] by 1 B       3       6

  1 2 3 4 5 6 7 8

C 2 2 3 7 7 7  1 2 3 4 5 6

and so on…

Page 46: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on.

46

Example - Continued

A[3] = 4, C[4] = 7, B[7] = 4

B       3     4 6  1 2 3 4 5 6 7 8

C 2 2 3 6 7 7  1 2 3 4 5 6

Page 47: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on.

47

Example - Continued

A[4] = 1, C[1] = 2, B[2] = 1

B   1   3     4 6  1 2 3 4 5 6 7 8

C 1 2 3 6 7 7  1 2 3 4 5 6

Page 48: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on.

48

Example - Continued

A[5] = 3, C[3] = 3, B[3] = 3

B   1 3 3     4 6  1 2 3 4 5 6 7 8

C 2 2 2 6 7 7  1 2 3 4 5 6

Page 49: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on.

49

Example - Continued

A[6] = 4, C[4] = 6, B[6] = 4

B   1 3 3   4 4 6  1 2 3 4 5 6 7 8

C 1 2 2 5 7 7  1 2 3 4 5 6

Page 50: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on.

50

Example - Continued

A[7] = 1, C[1] = 1, B[1] = 1

B 1 1 3 3   4 4 6  1 2 3 4 5 6 7 8

C 0 2 2 5 7 7  1 2 3 4 5 6

Page 51: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on.

51

Example - Continued

A[8] = 4, C[4] = 5, B[5] = 4

B 1 1 3 3 4 4 4 6  1 2 3 4 5 6 7 8

C 0 2 2 4 7 7  1 2 3 4 5 6

Page 52: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on.

52

Formal Algorithm

Procedure CountingSort(A, B, k, n)for i 1 to k

C[i] 0for i 1 to n

C[A[i]] C[A[i]] + 1// C[i] now contains a counter of how often i occursfor i 2 to k

C[i] C[i] + C[i-1]// C[i] now contains # of elements ifor i n downto 1

B[C[A[i]]] A[i]C[A[i]] C[A[i]] - 1

Page 53: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on.

53

Without using a second array B…Procedure Single_Array_CountingSort(A, k, n)

for i 1 to kC[i] 0

for i 1 to nC[A[i]] C[A[i]] + 1

// C[i] now contains a counter of how often i occurspos 1for i 1 to k

for j 1 to C[i]A[pos] ipos pos + 1

Page 54: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on.

54

Analysis of Single-Array Counting SortFirst and second for loops take k = O(n) stepsThen, two nested for loops… O(n2) ??? A more accurate upper bound??...Yes…For each i … inner for loop executes C[i] timesThen, two for loops execute

k

iiC

1][

Theorem: niCk

i 1

][

Proof (sketch):

Second for loop executed n times.

Each step an element in C is increased by 1. q.e.d.

Page 55: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on.

55

Discussion

Complexity:If the list is of size n and k = O(n), then T(n) = O(n)

Stability:

A sorting method is stable ifequal elements are output

in the same order they had in the input. Theorem:

Counting Sort is stable.

Page 56: Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied problems in Computer Science The ultimate reference on.

56

Lower Bounds on Sorting

Theorem:For any comparison sort of n elements

T(n) = (n log n) Remark:

T(n) = (g(n)) means thatT(n) grows at least as fast as g(n)