1 More Sorting; Searching Dan Barrish-Flood. 2 Bucket Sort Put keys into n buckets, then sort each...

16
1 More Sorting; Searching Dan Barrish-Flood

Transcript of 1 More Sorting; Searching Dan Barrish-Flood. 2 Bucket Sort Put keys into n buckets, then sort each...

1

More Sorting; Searching

Dan Barrish-Flood

2

Bucket Sort•Put keys into n buckets, then sort each bucket, then concatenate.

•If keys are uniformly distributed (quite an assumption), then each bucket will be small, so each bucket can be sorted very quickly (using probably insertion sort).

•Analysis is complex, because this involves probability and random distribution (see CLRS p. 175 – 176!!)

•Bucket sort runs in Θ(n), i.e. linear!

3

Bucket Sort, cont’dAssume the keys are in the range [0,1)

Divide the range [0,1) into n equal-sized buckets, so that the 1st bucket is [0, 1/n)

the 1st bucket is [0, 1/n)

the 2nd bucket is [1/n , 2/n)

....

the nth bucket is [(n-1)/n , 1)

If the buckets are B[0], B[1], ..., B[n-1] then the appropriate bucket for the key A[i] is FLOOR(n A[i])

4

Bucket Sort Code

5

Bucket Sort in Action

6

When Linear Sorts can be used

Integer Real String

Counting

Sort

Yes No No

Radix

Sort

Yes No Yes*

Binsort

(=BucketSort)

Yes* Yes Yes*

*with modifications

7

The Selection ProblemInput: a set A of n (distinct) numbers, and a number i where 1 ≤ i ≤ n

Output: the element x Є A that is larger than exactly i-1 other elements of A (in practice, we allow ties).

The i th smallest value is called the ith order statistic.

The 1st order statistic is the minimum.

The nth order statistic is the maximum

the n/2 order statistic is the median (OK, there can be two medians, but we usually go with the lower one).

Q. Solve the selection problem in Θ(?) time...

8

Min and MaxCan find the min (or max) in n-1 comparisons in

the way you learned in your first programming class:

MINIMUM(A)1. min ← A[1]2. for i ← 2 to length[A]3. do if min > A[i ]4. then min ← A[i ]5. return min

The n-1 is also a lower bound....

9

Min and Max (Cont’d)...thus we can find both min and max in 2n – 2 comparsions. But we can do a bit better (thought not asymptotically better). We can find both min and max in 3·floor(n/2) comparisions:•keep track of both min and max•for each pair of elements:

first compare them to each other;compare the larger to maxcompare the smaller to min

This is the lower bound.

10

Selection in GeneralRANDOMIZED-PARTITION(A,p,r)1. i ← RANDOM(p,r)2. exchange A[r] with A[i ]3. return PARTITION(A,p,r)

An average-case Θ(n) algorithm for finding the ith smallest element: like QuickSort, but only recurse on one part.

Average-case analaysis along the same lines as quicksort. Worst-case running time?

11

Solution in Linear time (Worst-case)A clever algorithm of largely theoretical interest.

SELECT(A, i ) finds the i th smallest element of A in Θ(n) time, worst-case.

Idea: partition, but do some extra work to find a split that is guaranteed to be good.

12

the SELECT algorithm

13

More on SELECTn = 5: o o o o ofind the median directly, or by sorting

n = 25: 1. make 5 groups of 5 each2. find the median of each group (that’s 5 numbers)3. find the median-of-medians, x.note: x > about half the elements of about half the groups, and x < about half the elements of about half the groups. So x is in the middle half of the set (not necessarily the exact middle element!), so x is a great pivot for PARTITION. To put it more precisely...

14

SELECT high-level descriptionSELECT(A, i )

1. Divide A into floor(n/5) groups of 5 elements (and possibly one smaller groups of left-overs, as in the figure in an ealier slide).

2. Find the median of each 5-element group.

3. use SELECT recursively find the median x of the ceiling(n/5) medians found is step 2.

4. PARTITION A around x, where k is the number of elements on the low side, so that x is the kth smallest element.

5. if i = k then return x, otherwise call SELECT recursively to find the ith smallest element on the low side if i < k, or the (i-k)th smallest element on the high side otherwise.

15

Analsysis of SELECT

610

7

6) -n 10

3( -n

ispartition a of size possiblelargest theSo

x. are elements 6

least at that showsargument same

6

2)-52

13(

x" elements ofnumber " so

group.over -left theand x containing group for the

except x, elements 3 contribute 1/2least At

groups. n/5 are There

an x?greater th are elementsmany How

10

3

10

3

n

n

n

n

16

Recurrence for SELECT

case- worst(n), is SELECT thus80, n when cenough large apick can We

O(n) 7-10

nc

or ),(710

9cn need We

)(710

9cn

)(610

7

5

cn

)(610

75 T(n)

80 n all and c somefor cn T(n) Assume

induction) (i.e.on substitutiby Proof

80n if )(610

7)5(TT(n)

80 n if )1()(

)(

)(c

)(

nOccn

nOc

nOccn

nOnn

nOn

Tn

nT

c