Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 24,...
-
Upload
theodora-roxanne-boone -
Category
Documents
-
view
225 -
download
0
Transcript of Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 24,...
![Page 1: Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004.](https://reader035.fdocuments.net/reader035/viewer/2022070412/5697bf7a1a28abf838c83156/html5/thumbnails/1.jpg)
Sorting: Implementation
15-211
Fundamental Data Structures and Algorithms
Klaus Sutner
February 24, 2004
![Page 2: Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004.](https://reader035.fdocuments.net/reader035/viewer/2022070412/5697bf7a1a28abf838c83156/html5/thumbnails/2.jpg)
Announcements
Homework #5
Midterm
March 4
Review: March 2
![Page 3: Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004.](https://reader035.fdocuments.net/reader035/viewer/2022070412/5697bf7a1a28abf838c83156/html5/thumbnails/3.jpg)
Today
- Recall Sorting
- Implementation Issues
- Average case RT for quicksort
- Timing Results
![Page 4: Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004.](https://reader035.fdocuments.net/reader035/viewer/2022070412/5697bf7a1a28abf838c83156/html5/thumbnails/4.jpg)
Total Recall: Sorting Algorithms
![Page 5: Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004.](https://reader035.fdocuments.net/reader035/viewer/2022070412/5697bf7a1a28abf838c83156/html5/thumbnails/5.jpg)
The Bible
Robert Sedgewick
Algorithms in CParts 1-4Fundamentals, Data Structures, Sorting, Searching
Addison-Wesley 1998
![Page 6: Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004.](https://reader035.fdocuments.net/reader035/viewer/2022070412/5697bf7a1a28abf838c83156/html5/thumbnails/6.jpg)
Multiple Keys
We could use a special comparator function (this would require a special function for each combination of keys).
Easier is often to
- first sort by name- stable sort by year
Done!
![Page 7: Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004.](https://reader035.fdocuments.net/reader035/viewer/2022070412/5697bf7a1a28abf838c83156/html5/thumbnails/7.jpg)
Sorting Review
Several simple, quadratic algorithms (worst case and average).
- Bubble Sort- Selection Sort- Insertion Sort
Only Insertion Sort of practical interest: running time linear in number of inversion of input sequence.
Constants small. Also stable.
![Page 8: Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004.](https://reader035.fdocuments.net/reader035/viewer/2022070412/5697bf7a1a28abf838c83156/html5/thumbnails/8.jpg)
Sorting Review
Asymptotically optimal O(n log n) algorithms (worst case and average).
- Merge Sort- Heap Sort
Merge Sort purely sequential and stable.
But requires extra memory: 2n + O(log n).
![Page 9: Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004.](https://reader035.fdocuments.net/reader035/viewer/2022070412/5697bf7a1a28abf838c83156/html5/thumbnails/9.jpg)
Quick Sort
Overall fastest. In place.
BUT:
Worst case quadratic.
Not stable.
Implementation details messy.
![Page 10: Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004.](https://reader035.fdocuments.net/reader035/viewer/2022070412/5697bf7a1a28abf838c83156/html5/thumbnails/10.jpg)
Picking An Algorithm
First Question: Is the input short?
Short means something like n < 500.
In this case Insertion Sort is probably the best choice.
Don't bother with asymptotically faster methods.
![Page 11: Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004.](https://reader035.fdocuments.net/reader035/viewer/2022070412/5697bf7a1a28abf838c83156/html5/thumbnails/11.jpg)
Picking An Algorithm
Second Question: Does the input have special properties?
E.g., if the number of inversions is small, Insertion Sort may be the best choice.
Or linear sorting methods may be appropriate.
![Page 12: Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004.](https://reader035.fdocuments.net/reader035/viewer/2022070412/5697bf7a1a28abf838c83156/html5/thumbnails/12.jpg)
Otherwise: Quick Sort
Large inputs, comparison based method, stability not required (recall our stabilizer trick, though).
Quick Sort is worst case quadratic, why should it be the default candidate?
On average, Quick Sort is O(n log ), and the constants are quite small.
![Page 13: Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004.](https://reader035.fdocuments.net/reader035/viewer/2022070412/5697bf7a1a28abf838c83156/html5/thumbnails/13.jpg)
Average ???
Average case analysis requires a probability distribution on the inputs: we have to average the running times.
t(n) = px t(x)
where the sum is over all instances of size n and px is the probability of getting instance x.
Often simply assume uniform distribution: every instance (of a certain size) is equally likely.
![Page 14: Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004.](https://reader035.fdocuments.net/reader035/viewer/2022070412/5697bf7a1a28abf838c83156/html5/thumbnails/14.jpg)
A Computation
Can we write down a recurrence equation?
Can we solve the equation?
At least approximately?
Is the solution (if any) practically relevant?
(see handout from last time)
![Page 15: Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004.](https://reader035.fdocuments.net/reader035/viewer/2022070412/5697bf7a1a28abf838c83156/html5/thumbnails/15.jpg)
Implementing Quick Sort
![Page 16: Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004.](https://reader035.fdocuments.net/reader035/viewer/2022070412/5697bf7a1a28abf838c83156/html5/thumbnails/16.jpg)
Pivot Selection
Ideally, the pivot should be the median.
Much too slow to be of practical value.
Instead either
- pick the pivot at random, or
- take the median of a small sample.
![Page 17: Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004.](https://reader035.fdocuments.net/reader035/viewer/2022070412/5697bf7a1a28abf838c83156/html5/thumbnails/17.jpg)
Partitioning
Partitioning is easy if we use extra scratch space. But we would like to partition in place.
Need to move elements within the same given block of the big array.
Basic idea: use two pointers, sweep across block from left and right till an out-of-place element is encountered. Swap them.
![Page 18: Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004.](https://reader035.fdocuments.net/reader035/viewer/2022070412/5697bf7a1a28abf838c83156/html5/thumbnails/18.jpg)
1. Doing quicksort in place
85 24 63 50 17 31 96 45
85 24 63 45 17 31 96 50
L R
85 24 63 45 17 31 96 50
L R
31 24 63 45 17 85 96 50
L R
![Page 19: Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004.](https://reader035.fdocuments.net/reader035/viewer/2022070412/5697bf7a1a28abf838c83156/html5/thumbnails/19.jpg)
1. Doing quicksort in place
31 24 63 45 17 85 96 50
L R
31 24 17 45 63 85 96 50
R L
31 24 17 45 50 85 96 63
31 24 17 45 63 85 96 50
L R
![Page 20: Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004.](https://reader035.fdocuments.net/reader035/viewer/2022070412/5697bf7a1a28abf838c83156/html5/thumbnails/20.jpg)
Pseudo Code
i = lo – 1; j = hi;while( true ) {
while( A[++i] < p );while( p < a[--j] ) if( j==lo ) break;if( i >= j ) break;swap( i, j );
}swap( i, hi );return i;
![Page 21: Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004.](https://reader035.fdocuments.net/reader035/viewer/2022070412/5697bf7a1a28abf838c83156/html5/thumbnails/21.jpg)
Getting Out
Using Quick Sort on very short arrays is a bad idea: the overhead becomes too large.
So, when the block becomes short we should exit Quick Sort and switch to Insertion Sort.
But not locally:
quicksort( A, lo, hi ) {if( hi – lo < magic_number )
insertionsort( A, lo, hi );else …
![Page 22: Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004.](https://reader035.fdocuments.net/reader035/viewer/2022070412/5697bf7a1a28abf838c83156/html5/thumbnails/22.jpg)
Getting Out
Just do nothing when the block is short. Then do one global cleanup with insertion sort.
quicksort( A, 0, n ) insertionsort( A, 0, n );
This is linear, since the number of inversions is linear.
![Page 23: Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004.](https://reader035.fdocuments.net/reader035/viewer/2022070412/5697bf7a1a28abf838c83156/html5/thumbnails/23.jpg)
Magic Number
The best way to determine the magic number is to run real-world tests.
It seems that for current architectures, some value in the range 5 to 20 will work best.
![Page 24: Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004.](https://reader035.fdocuments.net/reader035/viewer/2022070412/5697bf7a1a28abf838c83156/html5/thumbnails/24.jpg)
Equal Elements
Note that ideally pivoting should produce three sub-blocks:
left: < pmiddle: == pright: > p
Then the recursion could ignore the middle part, possibly omitting many elements.
![Page 25: Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004.](https://reader035.fdocuments.net/reader035/viewer/2022070412/5697bf7a1a28abf838c83156/html5/thumbnails/25.jpg)
Equal Elements
Three natural strategies:
Both pointers stop.Only one pointer stops.Neither pointer stops.
Fact: The first strategy works best overall.
![Page 26: Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004.](https://reader035.fdocuments.net/reader035/viewer/2022070412/5697bf7a1a28abf838c83156/html5/thumbnails/26.jpg)
Equal Elements
There are clever implementations that partition into three sub-blocks.
This is amazingly hard to get both right and fast.
Try it!
![Page 27: Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004.](https://reader035.fdocuments.net/reader035/viewer/2022070412/5697bf7a1a28abf838c83156/html5/thumbnails/27.jpg)
Application:Quick Select
![Page 28: Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004.](https://reader035.fdocuments.net/reader035/viewer/2022070412/5697bf7a1a28abf838c83156/html5/thumbnails/28.jpg)
Selection (Order Statistics)
A classical problem: given a list, find the k-th element in the ordered list.
The brute-force approach sorts the whole list first, and thus produces more information than required.
Can we get away with less than n log n work (in a comparison based world)?
![Page 29: Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004.](https://reader035.fdocuments.net/reader035/viewer/2022070412/5697bf7a1a28abf838c83156/html5/thumbnails/29.jpg)
Easy Cases
Needless to say, when k is small there are easy answers.
- Scan the array and keep track of the k smallest.
- Use a Selection Sort approach.
But how about general k?
![Page 30: Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004.](https://reader035.fdocuments.net/reader035/viewer/2022070412/5697bf7a1a28abf838c83156/html5/thumbnails/30.jpg)
Selection and Partitioning
qselect( A, lo, hi, k ) {if( hi <= lo ) return;i = partition( A, lo, hi );if( i > k ) qselect( A, lo, i-1, k );if( i < k ) qselect( A, i+1, hi, k );
}
This looks like a typo.
What’s really going on here?
![Page 31: Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004.](https://reader035.fdocuments.net/reader035/viewer/2022070412/5697bf7a1a28abf838c83156/html5/thumbnails/31.jpg)
Quick Select
What should we expect as running time?
As usual, if there is a ghost in the machine, it could force quadratic behavior.
But on average this algorithm is linear.
Don’t get any ideas about using this to find the median in the pivoting step of Quick Sort!
![Page 32: Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004.](https://reader035.fdocuments.net/reader035/viewer/2022070412/5697bf7a1a28abf838c83156/html5/thumbnails/32.jpg)
Some Timing Results
![Page 33: Sorting: Implementation 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 24, 2004.](https://reader035.fdocuments.net/reader035/viewer/2022070412/5697bf7a1a28abf838c83156/html5/thumbnails/33.jpg)
The Real World
Beyond asymptotic analysis, it is always a good idea to do some real world testing.
Construct a small test-bed:
- automate testing- flexible but simple- organize the data in a useful way