Chapter 10 Sorting and Searching - Lakehead...

54
CS 2412 Data Structures Chapter 10 Sorting and Searching

Transcript of Chapter 10 Sorting and Searching - Lakehead...

Page 1: Chapter 10 Sorting and Searching - Lakehead Universityccc.cs.lakeheadu.ca/cs2412/slides/cs2412-s10.pdfroot and reheap down. Straight selection sort: In each pass of the selection sort,

CS 2412 Data Structures

Chapter 10

Sorting and Searching

Page 2: Chapter 10 Sorting and Searching - Lakehead Universityccc.cs.lakeheadu.ca/cs2412/slides/cs2412-s10.pdfroot and reheap down. Straight selection sort: In each pass of the selection sort,

Some concepts

• Sorting is one of the most common data-processing

applications.

• Sorting algorithms are classed as either internal or external.

• Sorting order can be either ascending sequence or descending

sequence.

• Sort stability is an attribute of a sort, indicating that data with

equal keys maintain their relative input order in the output.

• Sort efficiency usually is based on the comparisons and moves

required for the sorting. The best possible sorting algorithms

are O(n log n).

• During the sorting process, each traversal of the data is

referred to as a sort pass.

Data Structure 2016 R. Wei 2

Page 3: Chapter 10 Sorting and Searching - Lakehead Universityccc.cs.lakeheadu.ca/cs2412/slides/cs2412-s10.pdfroot and reheap down. Straight selection sort: In each pass of the selection sort,

Selection sorts

• Heap sort: we have already discussed. First build a heap. Then

remove the root of the heap and put the last element to the

root and reheap down.

• Straight selection sort: In each pass of the selection sort, the

smallest element is selected from the unsorted sublist and

exchange with the element at the beginning of the unsorted list.

Data Structure 2016 R. Wei 3

Page 4: Chapter 10 Sorting and Searching - Lakehead Universityccc.cs.lakeheadu.ca/cs2412/slides/cs2412-s10.pdfroot and reheap down. Straight selection sort: In each pass of the selection sort,

Data Structure 2016 R. Wei 4

Page 5: Chapter 10 Sorting and Searching - Lakehead Universityccc.cs.lakeheadu.ca/cs2412/slides/cs2412-s10.pdfroot and reheap down. Straight selection sort: In each pass of the selection sort,

Algorithm selectionSort (list, last)

set current to 0

loop (until last element sorted)

set smallest to current

set walker to current +1

loop (walker key < smallest key)

set smallest to walker

increment walker

end loop

exchange (current, smallest)

increment current

end loop

Data Structure 2016 R. Wei 5

Page 6: Chapter 10 Sorting and Searching - Lakehead Universityccc.cs.lakeheadu.ca/cs2412/slides/cs2412-s10.pdfroot and reheap down. Straight selection sort: In each pass of the selection sort,

The efficiency of selection sort

• Straight select sort: O(n2). The algorithm has two level of

loops, each of the loop executes about n times.

• Heap sort: O(n log n). To build a heap, about n log n loops are

needed. To sort from the heap needs another n log n loops. In

big-O notation, the complexity is O(n log n).

Data Structure 2016 R. Wei 6

Page 7: Chapter 10 Sorting and Searching - Lakehead Universityccc.cs.lakeheadu.ca/cs2412/slides/cs2412-s10.pdfroot and reheap down. Straight selection sort: In each pass of the selection sort,

Insertion sorts

• Straight insertion sort: the list is divided into sorted and

unsorted sublists. In each pass the first element of the unsorted

sublist is inserted into the sorted sublist at correct position.

• Shell sort: the list is divided into K segments and each

segment is sorting (the segments are dispersed through the

list). After each passing, the number of segments is reduced

according to a increment. When the number of segments is

reduced to 1, the list is sorted.

Data Structure 2016 R. Wei 7

Page 8: Chapter 10 Sorting and Searching - Lakehead Universityccc.cs.lakeheadu.ca/cs2412/slides/cs2412-s10.pdfroot and reheap down. Straight selection sort: In each pass of the selection sort,

Data Structure 2016 R. Wei 8

Page 9: Chapter 10 Sorting and Searching - Lakehead Universityccc.cs.lakeheadu.ca/cs2412/slides/cs2412-s10.pdfroot and reheap down. Straight selection sort: In each pass of the selection sort,

Algorithm insertionSort(list, last)

set current to 1

loop (until last element sorted)

move current element to hold

set walker to current - 1

loop (walker >= 0 AND hold key < walker key)

move walker element right one element

decrement walker

end loop

move hold to walker + 1 element

increment current

end loop

Data Structure 2016 R. Wei 9

Page 10: Chapter 10 Sorting and Searching - Lakehead Universityccc.cs.lakeheadu.ca/cs2412/slides/cs2412-s10.pdfroot and reheap down. Straight selection sort: In each pass of the selection sort,

The main idea for the Shell sort is divide the list into segments and

use insertion sort to sort each segment.

The positions of the elements of a segment are at a distance of

increment. In the following example, the list is of size 10. The 5

segments for increment K = 5 are as follows:

Segment 1. A[0], A[5]

Segment 2. A[1], A[6]

Segment 3. A[2], A[7]

Segment 4. A[3], A[8]

Segment 5. A[4], A[9]

Then for increment K = 2

Segment 1. A[0], A[2], A[4], A[6], A[8]

Segment 2. A[1], A[3], A[5], A[7], A[9]

Data Structure 2016 R. Wei 10

Page 11: Chapter 10 Sorting and Searching - Lakehead Universityccc.cs.lakeheadu.ca/cs2412/slides/cs2412-s10.pdfroot and reheap down. Straight selection sort: In each pass of the selection sort,

Data Structure 2016 R. Wei 11

Page 12: Chapter 10 Sorting and Searching - Lakehead Universityccc.cs.lakeheadu.ca/cs2412/slides/cs2412-s10.pdfroot and reheap down. Straight selection sort: In each pass of the selection sort,

Data Structure 2016 R. Wei 12

Page 13: Chapter 10 Sorting and Searching - Lakehead Universityccc.cs.lakeheadu.ca/cs2412/slides/cs2412-s10.pdfroot and reheap down. Straight selection sort: In each pass of the selection sort,

Algorithm shellSort (list, last)

set incre to last / 2

loop (incre not 0)

set current to incre

loop(until last element sorted)

move current element to hold

set walker to current - incre

loop (walker>=0 AND hold key < walker key)

move walker element one increment right

set walker to walker - incre

end loop

move hold to walker + incre element

increment current

end loop

set incre to incre / 2

end loop

Data Structure 2016 R. Wei 13

Page 14: Chapter 10 Sorting and Searching - Lakehead Universityccc.cs.lakeheadu.ca/cs2412/slides/cs2412-s10.pdfroot and reheap down. Straight selection sort: In each pass of the selection sort,

void shellSort (int list [], int last)

{

int hold;

int incre;

int walker;

incre = last / 2;

while (incre != 0)

{

for (int curr = incre; curr <= last; curr++)

{

hold = list [curr];

walker = curr - incre;

while (walker >= 0 && hold < list [walker])

{

list [walker + incre] = list [walker];

walker = ( walker - incre );

Data Structure 2016 R. Wei 14

Page 15: Chapter 10 Sorting and Searching - Lakehead Universityccc.cs.lakeheadu.ca/cs2412/slides/cs2412-s10.pdfroot and reheap down. Straight selection sort: In each pass of the selection sort,

} // while

list [walker + incre] = hold;

} // for walk

incre = incre / 2;

} // while

return;

} // shellSort

Note

In the above algorithm, the increment start from n/2, then each

pass reduce half of the size. This is not the most efficient way, but

simple. The ideal increments should be set so that no two elements

will appear at same segment more than once. But this is not easy

in general.

Data Structure 2016 R. Wei 15

Page 16: Chapter 10 Sorting and Searching - Lakehead Universityccc.cs.lakeheadu.ca/cs2412/slides/cs2412-s10.pdfroot and reheap down. Straight selection sort: In each pass of the selection sort,

Insertion sort efficiency:

• Straight insertion sort: O(n2). The algorithm has two

embedded loops. The execute times is about n(n+ 1)/2.

• Shell sort: the complexity is difficult to analysis. Using

empirical studies show that the average sort complexity is

O(n1.25)

Data Structure 2016 R. Wei 16

Page 17: Chapter 10 Sorting and Searching - Lakehead Universityccc.cs.lakeheadu.ca/cs2412/slides/cs2412-s10.pdfroot and reheap down. Straight selection sort: In each pass of the selection sort,

Exchange sorts

• Bubble sort: the list in divided into two sublists: sorted and

unsorted. The smallest element is bubbled from the unsorted

sublist to the sorted sublist each time.

• Quick sort: each time a pivot is selected. Then the elements

less than pivot and the elements greater or equal to pivot are

separated into two sublist. The pivot is put at its ultimately

correct location in the list.

Data Structure 2016 R. Wei 17

Page 18: Chapter 10 Sorting and Searching - Lakehead Universityccc.cs.lakeheadu.ca/cs2412/slides/cs2412-s10.pdfroot and reheap down. Straight selection sort: In each pass of the selection sort,

Example:

23 78 45 8 56 32

8 ∥23 78 45 32 56

8 23 ∥32 78 45 56

8 23 32 ∥45 78 56

8 23 32 45 ∥56 78

Data Structure 2016 R. Wei 18

Page 19: Chapter 10 Sorting and Searching - Lakehead Universityccc.cs.lakeheadu.ca/cs2412/slides/cs2412-s10.pdfroot and reheap down. Straight selection sort: In each pass of the selection sort,

Algorithm bubbleSort(list, last)

set current to 0

set sorted to false

loop (current <= last AND sorted false)

set walker to last

set sorted to true

loop (walker > current)

if (walker dta < walker -1 data)

set sorted to false

exchange (list, walker, walker -1)

end if

decrement walker

end loop

increment current

end loop

Data Structure 2016 R. Wei 19

Page 20: Chapter 10 Sorting and Searching - Lakehead Universityccc.cs.lakeheadu.ca/cs2412/slides/cs2412-s10.pdfroot and reheap down. Straight selection sort: In each pass of the selection sort,

Data Structure 2016 R. Wei 20

Page 21: Chapter 10 Sorting and Searching - Lakehead Universityccc.cs.lakeheadu.ca/cs2412/slides/cs2412-s10.pdfroot and reheap down. Straight selection sort: In each pass of the selection sort,

Note for quick sort

• There are different methods for selecting the pivot.

– Select the first element.

– Select the middle element.

– Select the median value of three elements: left, right and

the element in the middle of the list. This text uses this

method.

• When the partition becomes small, a straight insertion sort can

be used, which may be more efficient.

Data Structure 2016 R. Wei 21

Page 22: Chapter 10 Sorting and Searching - Lakehead Universityccc.cs.lakeheadu.ca/cs2412/slides/cs2412-s10.pdfroot and reheap down. Straight selection sort: In each pass of the selection sort,

Example for one pass of a quick sort:

Data Structure 2016 R. Wei 22

Page 23: Chapter 10 Sorting and Searching - Lakehead Universityccc.cs.lakeheadu.ca/cs2412/slides/cs2412-s10.pdfroot and reheap down. Straight selection sort: In each pass of the selection sort,

Algorithm medianLeft(sortData, left, right)

set mid to (left + right ) /2

if (left key > mid key)

exchange (sortData, left, mid)

end if

if (left key > right key)

exchange ( sortData, left, right)

end if

if(mid key > right key)

exchange (sortData, mid, right)

end if

exchange (sortData, left, mid) //put pivot in left.

Data Structure 2016 R. Wei 23

Page 24: Chapter 10 Sorting and Searching - Lakehead Universityccc.cs.lakeheadu.ca/cs2412/slides/cs2412-s10.pdfroot and reheap down. Straight selection sort: In each pass of the selection sort,

Data Structure 2016 R. Wei 24

Page 25: Chapter 10 Sorting and Searching - Lakehead Universityccc.cs.lakeheadu.ca/cs2412/slides/cs2412-s10.pdfroot and reheap down. Straight selection sort: In each pass of the selection sort,

Data Structure 2016 R. Wei 25

Page 26: Chapter 10 Sorting and Searching - Lakehead Universityccc.cs.lakeheadu.ca/cs2412/slides/cs2412-s10.pdfroot and reheap down. Straight selection sort: In each pass of the selection sort,

The list in Figure 12-15 is sorted as follows:

Data Structure 2016 R. Wei 26

Page 27: Chapter 10 Sorting and Searching - Lakehead Universityccc.cs.lakeheadu.ca/cs2412/slides/cs2412-s10.pdfroot and reheap down. Straight selection sort: In each pass of the selection sort,

The exchange sort efficiency:

• Bubble sort: O(n2). There are two loops in the algorithm. The

comparison is about n(n+ 1)/2.

• Quick sort: O(n logn). The algorithm has 5 loops. However,

for each pass, the partition is general half size as previous pass.

Roughly say, there are total log2 n passes.

Data Structure 2016 R. Wei 27

Page 28: Chapter 10 Sorting and Searching - Lakehead Universityccc.cs.lakeheadu.ca/cs2412/slides/cs2412-s10.pdfroot and reheap down. Straight selection sort: In each pass of the selection sort,

void bubbleSort (int list [], int last)

{

int temp;

for (int current = 0, sorted = 0;

current <= last && !sorted;

current++)

for (int walker = last, sorted = 1;

walker > current;

walker--)

if (list[ walker ] < list[ walker - 1 ])

{

sorted = 0;

temp = list[walker];

list[walker] = list[walker - 1];

list[walker - 1] = temp;

} // if

return;

} // bubbleSort

Data Structure 2016 R. Wei 28

Page 29: Chapter 10 Sorting and Searching - Lakehead Universityccc.cs.lakeheadu.ca/cs2412/slides/cs2412-s10.pdfroot and reheap down. Straight selection sort: In each pass of the selection sort,

External sorts

In external sorting, portions of the data may be stored in secondary

memory during the sorting process.

One important method for the external sort is merge the (sorted)

files in to one sorted file.

Data Structure 2016 R. Wei 29

Page 30: Chapter 10 Sorting and Searching - Lakehead Universityccc.cs.lakeheadu.ca/cs2412/slides/cs2412-s10.pdfroot and reheap down. Straight selection sort: In each pass of the selection sort,

Merge sorts

A simple merge is merge two sorted files into one file. For example,

we have two sorted lists:

• 1, 3, 5

• 2, 4, 6, 8, 10

After we merged these two list, we should obtain the following list:

1, 2, 3, 4, 5, 6, 7, 8, 9, 10.

Data Structure 2016 R. Wei 30

Page 31: Chapter 10 Sorting and Searching - Lakehead Universityccc.cs.lakeheadu.ca/cs2412/slides/cs2412-s10.pdfroot and reheap down. Straight selection sort: In each pass of the selection sort,

The following algorithm merges two sorted files file1, file2.

The combined data are written into file3

Algorithm mergeFiles

open files

read (file1 into record1)

read (file2 into record2)

loop (not end file1 or not end file2)

if (record1.key <= record2.key)

write (record1 to file3)

read (file1 into record1)

if (end of file1)

set record1.key to infinity

end if

else

write (record2 to file3)

Data Structure 2016 R. Wei 31

Page 32: Chapter 10 Sorting and Searching - Lakehead Universityccc.cs.lakeheadu.ca/cs2412/slides/cs2412-s10.pdfroot and reheap down. Straight selection sort: In each pass of the selection sort,

read (file2 into record2)

if (end of file2)

set record2 key to infinity

end if

end if

end loop

close files

end mergeFiles

Data Structure 2016 R. Wei 32

Page 33: Chapter 10 Sorting and Searching - Lakehead Universityccc.cs.lakeheadu.ca/cs2412/slides/cs2412-s10.pdfroot and reheap down. Straight selection sort: In each pass of the selection sort,

Merge unsorted files:

• Form merge runs for the files. Each run is ordered.

• The end of each run is identified by a stepdown.

• Merge each run of the two files.

• When one run is stepdown, the another run is rollout (copied

to the merged file).

Data Structure 2016 R. Wei 33

Page 34: Chapter 10 Sorting and Searching - Lakehead Universityccc.cs.lakeheadu.ca/cs2412/slides/cs2412-s10.pdfroot and reheap down. Straight selection sort: In each pass of the selection sort,

Data Structure 2016 R. Wei 34

Page 35: Chapter 10 Sorting and Searching - Lakehead Universityccc.cs.lakeheadu.ca/cs2412/slides/cs2412-s10.pdfroot and reheap down. Straight selection sort: In each pass of the selection sort,

The sorting process:

• Sort phase: Divide the file into merge files according to the size

of memory. Foe example, if we have 2300 records, but the

memory only can handle 500 records. We first read in 500

records and sort it as the first merge run. Then read and sort

501-1000 records as first run of the merge 2, etc.

• Merge phase: merge the sorted runs.

Data Structure 2016 R. Wei 35

Page 36: Chapter 10 Sorting and Searching - Lakehead Universityccc.cs.lakeheadu.ca/cs2412/slides/cs2412-s10.pdfroot and reheap down. Straight selection sort: In each pass of the selection sort,

Data Structure 2016 R. Wei 36

Page 37: Chapter 10 Sorting and Searching - Lakehead Universityccc.cs.lakeheadu.ca/cs2412/slides/cs2412-s10.pdfroot and reheap down. Straight selection sort: In each pass of the selection sort,

There are different merge concepts. We discuss 3 of them as

examples

• Natural merge: after merge, all data are written in one file and

need a distribute phase to redistribute the data to two files.

• Balance merge: use a constant number of input merge files and

the same number of output merger files.

• Ployphase merge: A constant number of input merge files are

merged to one output merge file, the input merge files are

immediately reused when their input has been completely

merged.

Data Structure 2016 R. Wei 37

Page 38: Chapter 10 Sorting and Searching - Lakehead Universityccc.cs.lakeheadu.ca/cs2412/slides/cs2412-s10.pdfroot and reheap down. Straight selection sort: In each pass of the selection sort,

Data Structure 2016 R. Wei 38

Page 39: Chapter 10 Sorting and Searching - Lakehead Universityccc.cs.lakeheadu.ca/cs2412/slides/cs2412-s10.pdfroot and reheap down. Straight selection sort: In each pass of the selection sort,

Data Structure 2016 R. Wei 39

Page 40: Chapter 10 Sorting and Searching - Lakehead Universityccc.cs.lakeheadu.ca/cs2412/slides/cs2412-s10.pdfroot and reheap down. Straight selection sort: In each pass of the selection sort,

Searching

• Binary search: for sorted list.

• Sequential search:

– Straight sequential search: each time check if the key

equals to the target AND if it is the last key.

– Sentinel sequential search: add the target at the end of the

list so that each time just check if key equals to the

target.

– Probability search: when a target is found, move the

element containing target up one location. In this way, most

frequent targets are easier to found.

Data Structure 2016 R. Wei 40

Page 41: Chapter 10 Sorting and Searching - Lakehead Universityccc.cs.lakeheadu.ca/cs2412/slides/cs2412-s10.pdfroot and reheap down. Straight selection sort: In each pass of the selection sort,

Hashed list searches

• Hashing is a method using key-to-address mapping to find the

data quickly.

• The basic idea is using a hash function to map a key (which is

at a large range) to a index (which is at a small range) of data.

• Some keys may be mapped to a same index (synonyms). Then

we need some method to solve the collision.

• The main part of hashing is to find good hashing methods.

Data Structure 2016 R. Wei 41

Page 42: Chapter 10 Sorting and Searching - Lakehead Universityccc.cs.lakeheadu.ca/cs2412/slides/cs2412-s10.pdfroot and reheap down. Straight selection sort: In each pass of the selection sort,

Data Structure 2016 R. Wei 42

Page 43: Chapter 10 Sorting and Searching - Lakehead Universityccc.cs.lakeheadu.ca/cs2412/slides/cs2412-s10.pdfroot and reheap down. Straight selection sort: In each pass of the selection sort,

Hashing methods:

• Direct method: the range of keys and the range of index are

the same.

• Subtraction method: subtract a fixed number from the key.

Also require both ranges are the same.

• Modulo-division method: index= key MODULO listSize

• Digit-extraction method: select digits at certain positions as

the index.

• Midsquare method: key is squared and the middle digits are

used as index.

Data Structure 2016 R. Wei 43

Page 44: Chapter 10 Sorting and Searching - Lakehead Universityccc.cs.lakeheadu.ca/cs2412/slides/cs2412-s10.pdfroot and reheap down. Straight selection sort: In each pass of the selection sort,

• Folding method: fold shift (key is divided into parts whose size

matches the size of the index. Then the left and right parts are

shifted and added with the middle part); fold boundary (the

left and right numbers are folded on a fixed boundary between

them and the center number. The two outside values are

reversed).

Data Structure 2016 R. Wei 44

Page 45: Chapter 10 Sorting and Searching - Lakehead Universityccc.cs.lakeheadu.ca/cs2412/slides/cs2412-s10.pdfroot and reheap down. Straight selection sort: In each pass of the selection sort,

• Rotation method: rotating the last character to the front of the

key. Usually used by incorporating with other methods.

• Pseudorandom method: the key is used as the seed in a

pseudorandom number generator, the resulting random number

is then scaled into the possible index range.

Data Structure 2016 R. Wei 45

Page 46: Chapter 10 Sorting and Searching - Lakehead Universityccc.cs.lakeheadu.ca/cs2412/slides/cs2412-s10.pdfroot and reheap down. Straight selection sort: In each pass of the selection sort,

Some concepts used in collision resolution method:

• Load factor: the number of elements in the list divided by the

number of physical allocated for the list, expressed as

percentage (better less than 75).

α =k

n× 100.

• Clustering: as data are added to a list and collisions are

resolved, some hashing algorithms tend to cause data to group

within the list.

Data Structure 2016 R. Wei 46

Page 47: Chapter 10 Sorting and Searching - Lakehead Universityccc.cs.lakeheadu.ca/cs2412/slides/cs2412-s10.pdfroot and reheap down. Straight selection sort: In each pass of the selection sort,

Data Structure 2016 R. Wei 47

Page 48: Chapter 10 Sorting and Searching - Lakehead Universityccc.cs.lakeheadu.ca/cs2412/slides/cs2412-s10.pdfroot and reheap down. Straight selection sort: In each pass of the selection sort,

Open addressing to resolve collisions (disadvantage: each collision

resolution increases the probability of future collisions).

• Linear probe: when data cannot be stored in the home address,

we resolve the collision by adding 1 to the current address.

Data Structure 2016 R. Wei 48

Page 49: Chapter 10 Sorting and Searching - Lakehead Universityccc.cs.lakeheadu.ca/cs2412/slides/cs2412-s10.pdfroot and reheap down. Straight selection sort: In each pass of the selection sort,

• Quadratic probe: the increment is the collision probe number

squared.

Data Structure 2016 R. Wei 49

Page 50: Chapter 10 Sorting and Searching - Lakehead Universityccc.cs.lakeheadu.ca/cs2412/slides/cs2412-s10.pdfroot and reheap down. Straight selection sort: In each pass of the selection sort,

• Pseudorandom collision resolution (double hashing): use a

pseudorandom number to resolve the collision. Use the collision

address as the key of the the pseudorandom generator.

Data Structure 2016 R. Wei 50

Page 51: Chapter 10 Sorting and Searching - Lakehead Universityccc.cs.lakeheadu.ca/cs2412/slides/cs2412-s10.pdfroot and reheap down. Straight selection sort: In each pass of the selection sort,

• Key offset (double hashing): calculate the new address as a

function of the old address and the key.

For example:

offSet = key / listSize

address = (offSet + old address) modulo listSize

Data Structure 2016 R. Wei 51

Page 52: Chapter 10 Sorting and Searching - Lakehead Universityccc.cs.lakeheadu.ca/cs2412/slides/cs2412-s10.pdfroot and reheap down. Straight selection sort: In each pass of the selection sort,

Linked list collision resolution: use a separate area to store

collisions and chains all synonyms together in a linked list (usually

use LIFO sequence). Two storage areas are used: prime area and

the overflow area.

Data Structure 2016 R. Wei 52

Page 53: Chapter 10 Sorting and Searching - Lakehead Universityccc.cs.lakeheadu.ca/cs2412/slides/cs2412-s10.pdfroot and reheap down. Straight selection sort: In each pass of the selection sort,

Bucket hashing: keys are hashed to buckets, nodes that

accommodate multiple data occurrences. (disadvantage: use more

empty space, when the bucket is full, collision occurs)

Data Structure 2016 R. Wei 53

Page 54: Chapter 10 Sorting and Searching - Lakehead Universityccc.cs.lakeheadu.ca/cs2412/slides/cs2412-s10.pdfroot and reheap down. Straight selection sort: In each pass of the selection sort,

Combination approaches may used:

bucket hashing first, then a linear probe is used if bucket is full.

Data Structure 2016 R. Wei 54