Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied...

Post on 17-Jan-2016

220 views 0 download

Transcript of Sorting Data Structures and Algorithms (60-254). Sorting Sorting is one of the most well-studied...

SortingData Structures and Algorithms (60-254)

2

Sorting

• Sorting is one of the most well-studied problems in Computer Science

• The ultimate reference on the subject is:

“The Art of Computer Programming: Vol. 3Sorting and Searching”,by D. E. Knuth

3

Formal Statement

Given a sequence of n numbers:

a1, a2, …, an

find a permutation of the numbers 1, 2, …, n such that

a(1) a(2) … a(n)

Permutation:3, 2, 1 (1) = 3, (2) = 2, (3) = 1 2, 1, 3 (1) = 2, (2) = 1, (3) = 31, 3, 2 (1) = 1, (2) = 3, (3) = 2…

are all permutations of 1, 2, 3

4

Comparison Sorts

• A comparison sort sorts by comparing elements pairwise.

• We study these comparison sorts:• Insertion Sort• Shellsort• Mergesort• Quicksort

5

Insertion Sort

Sort the sequence3, 1, 5, 4, 2

Sort 3 3Sort 3, 1 1, 3Sort 1, 3, 5 1, 3, 5Sort 1, 3, 5, 4 1, 3, 4, 5Sort 1, 3, 4, 5, 2 1, 2, 3, 4, 5

6

Incremental sorting

In general, at the ith step,

a1, a2, a3, …, ai-1, ai

are already sorted

a(1) a(2) … a(i)

for some permutation of 1, 2, …, i.

In the next step, ai+1 has to be inserted in the correct position

7

Analysis of Insertion Sort

What is worst-case input?Elements in decreasing order!! Example:

5, 4, 3, 2, 1# of comparisons

5 05, 4 4, 5 14, 5, 3 3, 4, 5 23, 4, 5, 2 2, 3, 4, 5 32, 3, 4, 5, 1 1, 2, 3, 4, 5 4

8

Worst case

In general, to insert ai+1 in its proper place,w.r.t. the sorted preceeding i numbers

a1, a2, …, ai, we can makei comparisons in the worst case.

Thus,

9

Shellsort

Due to Donald ShellFor example:

Shellsort the sequence:81, 94, 11, 96, 12, 35, 17, 95 (1)

Step 1: Sort all sequences that are four positions apart.

81, 12 12, 8194, 35 35, 9411, 17 11, 1796, 95 95, 96

Results in:12, 35, 11, 95, 81, 94, 17, 96 (2)

10

Shellsort

Step 2: 12, 35, 11, 95, 81, 94, 17, 96 (2)

Sort all sequences of (2) that are two positions apart.12, 11, 81, 17 11, 12, 17, 8135, 95, 94, 96 35, 94, 95, 96

Results in:11, 35, 12, 94, 17, 95, 81, 96 (3)

Step 3:

Sort all sequences of (3) that are one position apart.11, 35, 12, 94, 17, 95, 81, 96 11, 12, 17, 35, 81, 94, 95, 96 (4)

Sequence (4) is sorted !!

11

Observations

h1, h2, h3 = 4, 2, 1 is called a gap sequence.Different gap sequences are possibleEvery one of them must end with 1Shell’s gap sequence:

h1 = n/2

hi = hi-1 / 2 (downto hk = 1)

All sequences were sorted using insertion sortIn Step 3, we sorted the entire sequence, using insertion sort!

Advantage over straightforward insertion sort?

12

Example

Insertion sort on:81, 94, 11, 96, 12, 35, 17, 9581 081, 94 111, 81, 94 211, 81, 94, 96 111, 12, 81, 94, 96 411, 12, 35, 81, 94, 96 411, 12, 17, 35, 81, 94, 96 511, 12, 17, 35, 1, 94, 95, 96 2

__Total # of comparisons 19

13

Example

Insertion sort on:11, 35, 12, 94, 17, 95, 81, 9611 011, 35 111, 12, 35 211, 12, 35, 94 111, 12, 17, 35, 94 311, 12, 17, 35, 94, 95 111, 12, 17, 35, 81, 94, 95 311, 12, 17, 35, 81, 94, 95, 96 1

__Total # of comparisons 12

14

Analysis of Shellsort

Clever choice of a gap sequence leads to a subquadratic algorithm That is, for an n-element sequence, the # of comparisons:

when using the Hibbard sequence: 1, 3, 7, …, 2k-1

)()( 2/3nOnT

15

Mergesort

Sort:81, 94, 11, 96, | 12, 35, 17, 95 Mergesort (81, 94, 11, 96)

Mergesort(12, 35, 17, 95)Merge the two sorted lists from above two lines.

This is a Divide-and-conquer algorithm.

16

Divide

MS (81, 94, 11, 96, 12, 35, 17, 95)

MS (11, 96)

MS (12, 35, 17, 95)

MS (81, 94)

MS (81, 94, 11, 96)

MS (12, 35) MS (17, 95)

17

Conquer

Merge two sorted lists.MS (81, 94, 11, 96) = 11, 81, 94, 96 (1)MS (12, 35, 17, 95) = 12, 17, 35, 95 (2)

Compare 11 and 12Output 11Move index in list (1)Compare 12 and 81Output 12Move index in list (2)Compare 17 and 81Output 17Move index in list (2)

18

Number of Comparisons

A total of seven comparisons to generate the sorted list11, 12, 17, 35, 81, 94, 95, 96

This is the maximum!For if the lists were

81, 94, 95, 96 and11, 12, 17, 35

We would need only four comparisonsThe algorithm follows…

19

Procedure Mergesort(A)

n size of Aif (n > 1)

Set A1 A[1 ... n/2] // Create a new array A1

Set A2 A[n/2+1 ... n] // Create a new array A2

Mergesort(A1)

Mergesort(A2)

Merge(A, A1, A2)else

// A has only one element do nothing!

20

Procedure Merge(A, A1, A2)

n1 size of A1

n2 size of A2

i 1; j 1; k 1

while (i <= n1 and j <= n2)

if (A1[i] < A2[j])

A[k] A1[i]; i i +1else

A[k] A2[j]; j j + 1k k + 1

for m i to n1

A[k] A1[m]; k k + 1

for m j to n2

A[k] A2[m]; k k + 1

21

Theorem

To merge two sorted lists, each of length n, we need at most 2n – 1 comparisons

22

Complexity of Mergesort

T(n) = 2 T(n/2) + O(n) n > 2= 1 n = 1

Solution: O(n log n)

23

A Partitioning Game

Given L = 5, 3, 2, 6, 4, 1, 3, 7 Partition L

into L1 and L2 such that

every element in list L1

is less than or equal to

every element in list L2

How?

24

Split

a = first element of LMake

Every element of list L1

less than or equal toa

less than or equal to

every element of list L2

How?Using two indices

lx = left indexrx = right index

25

rx

Initial configuration

5, 3, 2, 6, 4, 1, 3, 7

lx

Rules: lx moves right until it meets an element 5 rx moves left until it meets an element 5 exchange elements and continue until indices meet or cross.

26

Intermediate configurations

5, 3, 2, 6, 4, 1, 3, 7

lx rxExchange and continue: 3, 3, 2, 6, 4, 1, 5, 7

Exchange and continue: 3, 3, 2, 1, 4, 6, 5, 7

lx and rx have crossed !!

27

Intermediate configurations

L1 = 3, 3, 2, 1, 4

L2 = 6, 5, 7

Now, do the same with the lists

L1 and L2

Initial configuration for L1

3, 3, 2, 1, 4

lx rx

3 = first element.

28

Intermediate configuration for L1

3, 3, 2, 1, 4

lx rxExchange and continue:

1, 3, 2, 3, 4

lx rxExchange and continue:

1, 2, 3, 3, 4

rx lx

Left and right indices have crossed!

29

Quicksort

We have new lists

L11 = 1, 2

L12 = 3, 3, 4with which we continue to do the same Partitioning stops once we have a list with only one elementAll this, done in place gives us the following sorted list

Lsorted = 1, 2, 3, 3, 4, 5, 6, 7This is Quicksort!!!

30

Partition – Formal Description

Procedure Partition(L, p, q)a L[p]lx p – 1rx q + 1while true

repeat rx rx -1 // Move right indexuntil L[rx] arepeat lx lx + 1 // Move left indexuntil L[lx] aif (lx < rx)

exchange(L[lx], L[rx])else

return rx // Indices have crossed

31

Quicksort

Procedure Quicksort(L, p, q)if (p < q)

r Partition(L, p, q)Quicksort(L, p, r)Quicksort(L, r+1, q)

To sort the entire array, the initial call is:

Quicksort(L, 1, n)where n is the length of L

32

Observations

• Choice of the partitioning element a is important

• Determines how the lists are split

• Desirable: To split the list evenly

• How?...

33

Undesirable Partitioning

List of size 2

List of size n-1

List of size n

34

Example

Such an undesirable partitioningis possible if we takethe following sorted sequence

3, 4, 5, 6, 7, 8, 9, 10

and we partition as described above

35

Desirable Partitioning

.

.

.

36

Choosing the pivot

Steering between the two extremes:Can we choose the partitioning element to steer between the two extremes?Heuristics:

Median-of-three Find median of first, middle and last element

orFind median of three randomly chosen elements.

37

Analysis

Worst-case behaviorT(n) = n + n-1 + … + 2 = O(n2)

Since to partition a list of size n-i (i 0) into two lists of size 1 and n-i-1we need to look at all n-i elements

n-i

1 n-i-1

T(n) = T(n-1) + O(n)

38

Best-case Behavior

T(n) = 2T(n/2) + O(n) T(n) = O(n log n) where T(n) = time to partition a list of

size n into two lists of size n/2 each.

39

Average-case Behavior

Tavg(n) = O(n log n) T(n) = T(n) + T(n) + O(n)

where + = 1

40

Sorting – in Linear Time??

Yes… but…only under certain assumptions on the input data

Linear-time sorting techniques:• Counting sort• Radix sort• Bucket sort

41

Counting Sort

Assumption:Input elements are integers in the range 1 to kwhere

k = O(n) Example:

Sort the listL = 3, 6, 4, 1, 3, 4, 1, 4

using counting sort.

42

Example

A 3 6 4 1 3 4 1 4  1 2 3 4 5 6 7 8

C              1 2 3 4 5 6

B                  1 2 3 4 5 6 7 8

Input is in array A C[i] counts # of times i occurs in the input at first Sorted array is stored in B

43

Example - Continued

Count # of times i occurs in A:

C 2 0 2 3 0 1  1 2 3 4 5 6

Cumulative counters:

C 2 2 4 7 7 8  1 2 3 4 5 6

Now C[i] contains the count of the number of elementsin the input that are i

44

Example - Continued

• Go through array A• First element is 3• From C[3] we know that 4 elements are to 3• So B[4] = 3• Decrease C[3] by 1 B       3        

  1 2 3 4 5 6 7 8

C 2 2 3 7 7 8  1 2 3 4 5 6

45

Example - Continued

• Next element in A is 6• C[6] = 8 eight elements are 6• B[8] = 6• Decrease C[6] by 1 B       3       6

  1 2 3 4 5 6 7 8

C 2 2 3 7 7 7  1 2 3 4 5 6

and so on…

46

Example - Continued

A[3] = 4, C[4] = 7, B[7] = 4

B       3     4 6  1 2 3 4 5 6 7 8

C 2 2 3 6 7 7  1 2 3 4 5 6

47

Example - Continued

A[4] = 1, C[1] = 2, B[2] = 1

B   1   3     4 6  1 2 3 4 5 6 7 8

C 1 2 3 6 7 7  1 2 3 4 5 6

48

Example - Continued

A[5] = 3, C[3] = 3, B[3] = 3

B   1 3 3     4 6  1 2 3 4 5 6 7 8

C 2 2 2 6 7 7  1 2 3 4 5 6

49

Example - Continued

A[6] = 4, C[4] = 6, B[6] = 4

B   1 3 3   4 4 6  1 2 3 4 5 6 7 8

C 1 2 2 5 7 7  1 2 3 4 5 6

50

Example - Continued

A[7] = 1, C[1] = 1, B[1] = 1

B 1 1 3 3   4 4 6  1 2 3 4 5 6 7 8

C 0 2 2 5 7 7  1 2 3 4 5 6

51

Example - Continued

A[8] = 4, C[4] = 5, B[5] = 4

B 1 1 3 3 4 4 4 6  1 2 3 4 5 6 7 8

C 0 2 2 4 7 7  1 2 3 4 5 6

52

Formal Algorithm

Procedure CountingSort(A, B, k, n)for i 1 to k

C[i] 0for i 1 to n

C[A[i]] C[A[i]] + 1// C[i] now contains a counter of how often i occursfor i 2 to k

C[i] C[i] + C[i-1]// C[i] now contains # of elements ifor i n downto 1

B[C[A[i]]] A[i]C[A[i]] C[A[i]] - 1

53

Without using a second array B…Procedure Single_Array_CountingSort(A, k, n)

for i 1 to kC[i] 0

for i 1 to nC[A[i]] C[A[i]] + 1

// C[i] now contains a counter of how often i occurspos 1for i 1 to k

for j 1 to C[i]A[pos] ipos pos + 1

54

Analysis of Single-Array Counting SortFirst and second for loops take k = O(n) stepsThen, two nested for loops… O(n2) ??? A more accurate upper bound??...Yes…For each i … inner for loop executes C[i] timesThen, two for loops execute

k

iiC

1][

Theorem: niCk

i 1

][

Proof (sketch):

Second for loop executed n times.

Each step an element in C is increased by 1. q.e.d.

55

Discussion

Complexity:If the list is of size n and k = O(n), then T(n) = O(n)

Stability:

A sorting method is stable ifequal elements are output

in the same order they had in the input. Theorem:

Counting Sort is stable.

56

Lower Bounds on Sorting

Theorem:For any comparison sort of n elements

T(n) = (n log n) Remark:

T(n) = (g(n)) means thatT(n) grows at least as fast as g(n)