Algorithms and Complexity€¦ · Search Complexity • Linear search: proportional to n • Binary...
Transcript of Algorithms and Complexity€¦ · Search Complexity • Linear search: proportional to n • Binary...
1
Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/
1
Algorithms and ComplexityNotes part 2: Searching & Sorting
Mark D [email protected]
35
11Mark Dunlop, Computer and Information Sciences, Strathclyde University
http://www.cis.strath.ac.uk/~mdd/2
Algorithms and ComplexitySearching
Mark D [email protected]
Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/
3
Split the room
• Sort yourself by birthday(e.g. 1/1 .... 31/12)
• Sort yourself by first name(e.g. Aaron...Zakia)
Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/
4
Searching - examples of accesses
• How many people in your half are called David
• How many people in your half have the same birthday
• Does anyone share my birthday• Who’s birthday is next in your half • How many people have a unique first
name in your half
2
Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/
5
SimpleMap.javapackage uk.ac.strath.cis.mdd.ac.simplemap;import java.util.Iterator;
public interface SimpleMap{public Iterator find(Comparable key);public void insert(SimpleMapMember member)
throws SimpleMapException;public void delete(Comparable key)
throws SimpleMapException;}
Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/
6
DirectoryMember.javapackage uk.ac.strath.cis.mdd.ac.simplemap;
public interface SimpleMapMember {public Comparable getKey();
}
Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/
7
SimplePhoneDirectoryimplementation
• Also on web are:– SimplePhoneDirectory.java– SimplePhoneKey.java– SimplePhoneMember.java
– Tester.java
• You are encouraged to examine....Mark Dunlop, Computer and Information Sciences, Strathclyde University
http://www.cis.strath.ac.uk/~mdd/8
Linear Search• Work through the items one at a time comparing• Clearly worst case O(n)
• on average O(½n) ≡ O(n)
int LinearSearch(S, k, low, high) //simplified
i = low; foundAt = -1while i<=high & !found
if S.keyAtPos(i)==k then foundAt = ii++
if foundAt == -1 throw DirectoryException(“not found”)return found
3
Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/
9
Background:Divide and Conquer
• An algorithmic design pattern– Make the problem smaller by working on a
smaller amount of data– Then have a quick way of combining
results, if need be• If halving each time then Divide and
Conquer often ends up with a complexity of O(log n)
Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/
10
Background:Divide and Conquer
• How many halvings before you have a individual case
splits number covered
=
1 2 21
2 4 22
3 8 23
4 16 24
5 32 25
i n 2i
Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/
11
Search Complexity• Linear search: proportional to n• Binary search: proportional to log2(n)
– How many times you need to halve it• log2 2 = 1• log2 4 = 2• log2 8 = 3• log2 1,024 = 10• log2 1,048,576 = 20
– BUT you need sorted data to start
Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/
12
Binary Search
• Narrow down the search range in stages using divide and conquer approach
• Pick middle point and decide if to left or right, then search that half recursively
• Must be very careful with boundaries
4
Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/
13
Binary SearchAlgorithm BinarySearch(S, k, low, high)
if low > high thenthrow DirectoryException(“Not Found”)
elsemid = (low+high) / 2
if k == key(mid) thenreturn key(mid)
else if k < key(mid) thenreturn BinarySearch(S, k, low, mid-1)
elsereturn BinarySearch(S, k, mid+1, high)
Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/
14
logs - a reminder
• Standard properties
• So log2(n) ≡ log4(n) ≡ logx(n) ---> log(n)
x yx yx =log
Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/
15
Exponents - a reminder
• Common properties
Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/
16
Directory ADT complexitypublic Iterator find(DirectoryKey key);
- unsorted - sorted
public void insert(DirectoryMember member) throws....
- unsorted - sorted
public void delete(DirectoryKey key ) throws....
- unsorted - sorted
Using an array for storage and binary search if sorted
5
Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/
17
Summary
• Divide and conquer• Linear (O(n)) v Binary search (O(log n))• Directory ADT• Careful O() analysis of Linear Search
• Next time:– Sorting introduction
Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/
18
Algorithms and ComplexitySorting
Mark D [email protected]
Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/
19
Quick Aside• What is the total of all the numbers from
1 to100?
• 5,050 - an O(1) algorithm that showed just how clever 10 year old Karl Guass was at school in 1787
2)1(
1
+=∑ =
nniN
i
Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/
20
Summing up to n
• 1+2+...+(n-2)+(n-1)+n
21
1
)( +=∑ =
nniN
i
6
Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/
21
Link to complexityint sum = 0;for (int i=1; i<=N; i++)
for (int j=1; j<=i; j++)sum++;
• Complexity O(n2), based on )(2
2 nnO +
Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/
22
Related...
• Geometric sequence
• e.g.
1+2+4+8+...+2n-1 = 2n -1
aaaaaa
NnN
ii
−−
=++++=+
=∑ 111
12
0...
Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/
23
Motivation for looking at sorting
• Many common operations much faster if data is kept sorted– find(key)– find min, find max, find median
• Most data is accessed much more than created
• Good algorithms for complexity analysis
Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/
24
Sorting Algorithms
• Bubble sort• Selection sort• Insertion sort• Shell sort
• Merge sort• Quick sort• Bucket sort and Radix sort
7
Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/
25
Bubble Sort
• Generally considered silly, but simple
for i = 0 to A.length-1for j = 1 to A.length-i
if A[j-1]>A[j] swap(A[j-1],A[j])
Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/
26
Selection Sortfor i = 0 to A.length-1
minpos = ifor j = i+1 to A.length-1
if A[j]<A[minpos] minpos = jswap(A[i], A[minpos])
Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/
27
Insertion Sortfor i = 1 to A.length-1
basevalue = A[i] j = iwhile j>0 && basevalue<A[j-1]
A[j] = A[j-1]j = j-1
A[j] = basevalue
• Involves lots of data shuffling• Works nicely with a user inputting the
data at screenMark Dunlop, Computer and Information Sciences, Strathclyde University
http://www.cis.strath.ac.uk/~mdd/28
Shell Sort
• Named after Donald Shell (inv 1959)• Big idea is to avoid large amount of data
movement by first comparing elements far apart, then slowly reducing to insertion sort
• Uses an increment sequence h1, h2, h3, ..., ht with h1 = 1 (any sequence will do but some are faster)
8
Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/
29
Shell Sort example 1
• Use increment sequence 1, 3, 5
• Shell proposed an increment sequence of 1, ..., N/16, N/8, N/4, N/2
after 1 sort 11 12 15 17 28 35 41 58 75 81 94 95 96
unsorted 81 94 11 96 12 35 17 95 28 58 41 75 15after 5-sort 35 17 11 28 12 41 75 15 96 58 81 94 95
before 3-sort 35 17 11 28 12 41 75 15 96 58 81 94 95after 3-sort 28 12 11 35 15 41 58 17 94 75 81 96 95
Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/
30
ShellSort: Psuedo Code//shellsort using Shell’s increment seqfor (int gap = A.length/2; gap>0; gap = gap/2)
for (int i=gap; i<A.length; i++)basevalue = A[i] j = iwhile j>= gap && basevalue<A[j-gap]
A[j] = A[j-gap]j = j - gap
A[j] = basevalue
Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/
31
ShellSort: Complexity• Shell proposed an increment sequence
of 1, ..., N/16, N/8, N/4, N/2• This still gives an O(n2) algorithm but a
much faster one - better “normal case”• If we divide by 2 and get an even
number then adding 1 to make it odd gives a complexity of O(n3/2)
• Dividing by 2.2 appears to give O(n5/4) but no proof exists
Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/
32
Performance of Shellsort
0
100,000
200,000
300,000
400,000
500,000
600,000
0 20,000 40,000 60,000 80,000
Insertion SortShell's ShellsortOdd Gaps Shellsortdiv 2.2 Shellsort
0
500
1,000
1,500
2,000
2,500
3,000
0 20,000 40,000 60,000 80,000
Insertion SortShell's ShellsortOdd Gaps Shellsortdiv 2.2 Shellsort
NInsertion
SortShell's
ShellsortOdd Gaps Shellsort
div 2.2 Shellsort
1,000 122 11 11 92,000 483 26 21 234,000 1,936 61 59 548,000 7,950 153 141 114
16,000 32,560 358 322 26932,000 131,911 869 752 57564,000 520,000 2,091 1,705 1,249
9
Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/
33
Ave Performance of Shellsort
• insertion sort – very approx 0.000122n2 + 0.2n +....
• Shell’s shellsort– very approx 0.0000002n2 + 0.02n +...
• But both still have a worst case of O(n2) but the revisions remove this worst case for shell sort
Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/
34
Merge Sort
• Shell sort is the best we can do with a swap ‘em around strategy
• Merge sort is a divide and conquer strategy (c.f. binary search)
If number of items to search <= 1 return
elsesort left half; sort right halfmerge halves
35
11
Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/
35
Sorting is the easy bitvoid sort (int[] A, left, right)if (left!=right)
int centre = (left + right) / 2sort (A, left, centre)sort (A, centre+1, right)merge (A, left, centre, right)
Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/
36
void merge (int[] A , left1, centre, right2)int[] B = new int[]int p1 = left1; int p2 = centre+1; int pB=0while p1<=centre & p2<=right2
if (A[p1]<=A[p2]) B[pB]=A[p1]; pB++; p1++
else B[pB]=A[p2]; pB++; p2++
// one list is empty now
while p1<=centre B[pB]=A[p1]; pB++; p1++
while p2<=right2B[pB]=A[p2]; pB++; p2++
// copy back from list B
for (int i=0, i<=right2-left1, i++) A[i+left1]=B[i]
10
Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/
37
Complexity Analysis
• Merge is clearly O(n)
• But how many times is it called?
Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/
38
The call tree
• at level 3: 8 merges of size n/8• at level i: 2i merges of size n/2i
• each level is O(n)
1 merge of size n (level 0)2 merges of size n/24 merges of size n/48 merges of size n/8 (level 3)
Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/
39
So how many levels?
• log2(n) levels each of O(n)
• so merge sort is O(n log n)
splits number covered
=
1 2 21
2 4 22
3 8 23
4 16 24
5 32 25
i n 2i
Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/
40
Common classes of Big-Oh
0
100
200
300
400
500
600
700
800
900
1000
0 20 40 60 80 100 120
number
time
log nnn log nn*n2*n*nn!
11
Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/
41
Quick Sort
• Can we do mergesort in line?• Yes, if we can we get rid of the merges?• Yes, if all data left is split so all data on
the left is below all data right of split.
• So, quicksort shuffles data before recursing
Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/
42
Overview of Quicksort
• Worst case is O(n2) but this can be avoided
• Very tightly written innermost loops make it very fast
• Very tricky to implement correctly -slightly off and its much slower
Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/
43
Quick Sort
if number of elements <=1 return
elsepick a pivot value v in APartition A into L and R such that
∀ i∈L: i≤v & ∀ j∈R: j≥v sort(L) sort(R)
Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/
44
Picking the pivot
• A[left]– wrong - if sorted leads to O(n2)
• Ideal is to split data in middle ( |L|=|R| ) to give O(n log n ) equalling merge sort
• Pick the median is perfect but doing this involves sorting!
12
Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/
45
Picking the pivot• Safe choice is value at (low+high)/2
– will do very well if already sorted– but not guaranteed to split data equally
• Could pick one randomly - average best• Pragmatic choice: pick the median of
– A[low]– A[(low+high)/2]– A[high]
Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/
46
Very pseudo code
//assume all elements are unique for nowPick the pivotGet the pivot out the way by putting it at the endSearch from left to right looking for a large element and
also search from right to left looking for a small elementSwap the large and the smallrepeat red until large & small elements swap roundswap pivot with last large itemsort left and right parts (pivot now in right place)
Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/
47
void sort(a, low, high)
//Pick the pivotint mid = (low + high) /2;if a[mid]<a[low] swap(a, low, mid)if a[high]<a[low] swap(a, low, high)if a[high]<a[mid] swap(a, mid, high)// low, mid, high are now sorted - use mid as pivot// note a[low] is small and a[high] is high so leave them
//Get the pivot out the way by putting it at the endswap(a, mid, high-1)
//begin partitioning...
Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/
48
//begin partitioningpivot = a[high-1]int i = low+1; int j=high-1;while i<j //i.e. while not swapped over
//Search from left to right looking for a large element //Search from right to left looking for a small elementwhile (a[i]<pivot) i++;while (a[j]>pivot) j--;if (i<j) swap(a, i, j) //Swap the large and the small
//swap pivot with last large item we found aboveswap(a, i, high-1)
//sort left and right parts - NB a[i] is in right placesort(a, low, i-1)sort(a, i+1, high)
13
Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/
49
Average case complexity
• On average we have O(n log n) plus a very fast n part
• Insertion sort is faster for small lists - so good quicksort implementations actually call insertion sort when the size of list is <5..20
Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/
50
Bucket Sort
• Now for something completely different• How about sorting based on the
contents of the key rather than just the order of the keys?
• Similar to how we sort forms by name– put the as in a pile, a*, the bs into b*...
Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/
51
Pseudo Codevoid bucketSort (queue q)//N = size of alphabetqueue[N] bucket
while q.isnotempty()x = a.removefromfront();bucket[x.key].add(x)
for i=0 to N-1while bucket[i].isnotempty()
a.insertatend(bucket[i].removefromfront())
Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/
52
Radix Sort
• Not the full story, how do we sort...– put the as in a pile, a*, the bs into b*...– split a* into aa* ab* ac*....– split the aa* into aaa*, aab*....
• That’s radix sorting
• But first a new concept...
14
Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/
53
Stable Sort
• A stable sort is a sort that preserves the order of elements that have equal keys
• Most sorts don’t honour this but its a nice feature that radix sort needs to be most efficient
Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/
54
Working backwards
• If we can do a stable sort of keyi, then we should sort keyi+1 first then stabily sort keyi
• e.g. GH KI FE BH IU XS DE HJ JT HU• sort second letter gives
– FE DE GH BH KI HJ KS KT IU HU• sort first letter stabily gives
– BH DE FE GH HJ HU IU KI KS KT
Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/
55
Radix Sort Complexity
• Given an alphabet of A letters, key length of L letters and N items
• Radix sort works in O(L(N+A))• Given that L & A are fixed for a given
dataset, this gives an O(N) sorter!
• Pseudo code left as exercise 8-)
Mark Dunlop, Computer and Information Sciences, Strathclyde Universityhttp://www.cis.strath.ac.uk/~mdd/
56
Sorting Algorithms
• Selection sort, Bubble sort, Insertion sort• Shell sort• Merge sort, Quick sort(~)• Bucket sort, Radix sort
• Next: sorting like things– Hash tables, priority queues & heaps