Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis.

Post on 20-Dec-2015

230 views 0 download

Tags:

Transcript of Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis.

Hashing, Sets, DictionariesCode Cleaning

Expandable Array Stacks and Amortized Analysis

Hashing so far

To store 250 IP addresses in table:

• Pick prime just bigger than 250 (n = 257)

• Pick a1, …, a4 mod 257 (once and for all)

• To hash x = (x1, …, x4):

– Compute u = a1x1 + … + a4x4 mod 257

– Store x in a bucket at myArray[u]

Generalization

Old: To store 250 IP addresses in table

New: store n1 items, each between 0 and N

Generalization

To store store n1 items between 0 and N• Pick prime n just bigger than n1

• Let k = round_up(logn N)– Each “item” can be written as a k-digit number,

base n

• Pick a1, …, ak mod n (once and for all)• To hash x = (x1, …, xk):

– Compute u = a1x1 + … + akxk mod n– Store x in a bucket at myArray[u]

Example

• Store 8 items, each represented by 16 bits (i.e., between 0 and 216 – 1 = 65535)

• Solution: pick p = 11.

• Log11 65535 = 4.625…, so we pick k = 5

• Pick 5 numbers a1, …, a5, mod 11: 3,10, 0, 5, 2

Example (cont.)• Multipliers: 3, 10, 0, 5, 2• Typical “key”: 31905. • Convert to base 11:

– Mod(31905, 11) = 5– Div(31905, 11) = 2900– Mod(2900, 11) = 7– Div (2900, 11) = 263 …

– 3190511 = 21A75 [“A” means “10”]

• Hash = 3*2 + 10*1 + 0*A + 5*7 + 2*6 mod 11 = 63 mod 11 = 7.

In practice

• Usually items aren’t given as integers between 0 and some large number N

• Doing arithmetic (like “finding the digits”) for big numbers (larger than language can represent) is a pain algorithmically

• Frequently have an “identifier” that’s a few bytes long, often encoded as a string of characters

Practice, cont’d

• Assume objects have k-byte identifiers x

• Compute u = a1x1 + … + akxk mod n

• Put (x, object) into hashbucket u

• This works as long as n > 256 = byte size

• Otherwise assumption of unif. distributed hash indexes is wrong

The SET Abstract Data Type

• create(n): creates a new empty set structure, initially empty but capable of holding up to n elements.

• empty(S): checks whether the set S is empty. • size (S): returns the number of elements in S. • element_of (x,S): checks whether the value x is in the

set S. • enumerate (S): yields the elements of S in some

arbitrary order. • add (S,x): adds the element x to S, if it is not there

already. • delete (S,x): removes the element x from S, if it is

there.

Implementing sets

• Can use hashtable:– “create”, “empty”, and “size” are trivial– “enumerate”: take all elements in all buckets– “add” is just “insert”; “delete” is “delete”– is_element is just “find”

DICTIONARY ADT• Create, empty, size as in SET• Still to do:

– Insert(key, value) – Find(key)

• Sometimes called “store” and “fetch”• A dictionary is sometimes called a “map”

– “key” is ‘mapped to’ “value”

• Closely related to a “database”• May allow several values for one key

– Find(key) returns a list of values in this case

Implementing a dictionary

• Create(n)– Build an array of prime size a little more than

n, each entry an empty list– Pick k numbers, mod n, to handle keys of

length k

• Insert(key, value)– Let u = (a1key1 + … + ak keyk) mod n

– Insert (key, value) into array[u]

• Find(key)– Let u = (a1key1 + … + ak keyk) mod n

– Search for (key, *) in array[u]– If you find (key, val), return val– Else return None

• (Modify as appropriate to return list of vals)

Summary

• We can now assume that we can create a SET or a DICT with O(n1) insertion and lookup times whenever we need one

• After this week’s HW, you can further assume that we don’t need to know the size of the SET or the DICT in advance

Example Application: JUMBLE!

JUMBLE

• Input: list of all 5-letter words in English

• Each word represented as an array of five characters

• Output: all words for which no other permutation is a word

Solution

• Start with an empty dictionary

• Foreach word w– Sort letters alphabetically to get wnew– D.insert(wnew, w)

• Foreach word w– Sort alphabetically again to get wnew

• D(wnew) contains anything except w– Skip w

• Else output w

Clean Your Code

• Errors per line ~ constant– Fewer errors overall!

• Easier to grade– More likely to get credit

• Cleaner code = cleaner thinking– Better understanding of material

LCA(u, v)

lca = null

udepth = T.depth(u)

vdepth = T.depth(v)

if (T.isroot(u) = true) or (T.isroot(v) = true) then

lca = T.root

while (lca = null) do

if (u = v) then

lca = u

else

if udepth > vdepth then

u = T.parent(u)

udepth = udepth – 1

else if vdepth > udepth

v = T.parent(v)

vdepth = vdepth – 1

else

u = T.parent(u)

v = T.parent(v)

return lca

LCA(u, v)

lca = null

udepth = T.depth(u)

vdepth = T.depth(v)

if (T.isroot(u) = true) or (T.isroot(v) = true) then

lca = T.root

while (lca = null) do

if (u = v) then

lca = u

else

if udepth > vdepth then

u = T.parent(u)

udepth = udepth – 1

else if vdepth > udepth

v = T.parent(v)

vdepth = vdepth – 1

else

u = T.parent(u)

v = T.parent(v)

return lca

LCA(u, v, T)

lca = null

udepth = T.depth(u)

vdepth = T.depth(v)

if (T.isroot(u) = true) or (T.isroot(v) = true) then

lca = T.root

while (lca = null) do

if (u = v) then

lca = u

else

if udepth > vdepth then

u = T.parent(u)

udepth = udepth – 1

else if vdepth > udepth

v = T.parent(v)

vdepth = vdepth – 1

else

u = T.parent(u)

v = T.parent(v)

return lca

Needlessly complex

LCA(u, v, T)

lca = null

udepth = T.depth(u)

vdepth = T.depth(v)

if (T.isroot(u) = true) or (T.isroot(v) = true) then

lca = T.root

while (lca = null) do

if (u = v) then

lca = u

else

if T.depth(u) > T.depth(v) then

u = T.parent(u)

else if T.depth(v) > T.depth(u)

v = T.parent(v)

else

u = T.parent(u)

v = T.parent(v)

return lca

Now irrelevant

LCA(u, v, T)

lca = null

if (T.isroot(u) = true) or (T.isroot(v) = true) then

lca = T.root

while (lca = null) do

if (u = v) then

lca = u

else

if T.depth(u) > T.depth(v) then

u = T.parent(u)

else if T.depth(v) > T.depth(u)

v = T.parent(v)

else

u = T.parent(u)

v = T.parent(v)

return lca

LCA(u, v, T)

lca = null

if (T.isroot(u) = true) or (T.isroot(v) = true) then

lca = T.root

while (lca = null) do

if (u = v) then

lca = u

else

if T.depth(u) > T.depth(v) then

u = T.parent(u)

else if T.depth(v) > T.depth(u)

v = T.parent(v)

else

u = T.parent(u)

v = T.parent(v)

return lca

Redundant

LCA(u, v, T)

lca = null

if T.isroot(u) or T.isroot(v) then

lca = T.root

while (lca = null) do

if (u = v) then

lca = u

else

if T.depth(u) > T.depth(v) then

u = T.parent(u)

else if T.depth(v) > T.depth(u)

v = T.parent(v)

else

u = T.parent(u)

v = T.parent(v)

return lca

LCA(u, v, T)

lca = null

if T.isroot(u) or T.isroot(v) then

lca = T.root

while (lca = null) do

if (u = v) then

lca = u

else

if T.depth(u) > T.depth(v) then

u = T.parent(u)

else if T.depth(v) > T.depth(u)

v = T.parent(v)

else

u = T.parent(u)

v = T.parent(v)

return lca

it’s the answer; return it!

LCA(u, v, T)

lca = null

if T.isroot(u) or T.isroot(v) then

lca = T.root

return lca

while (lca = null) do

if (u = v) then

lca = u

else

if T.depth(u) > T.depth(v) then

u = T.parent(u)

else if T.depth(v) > T.depth(u)

v = T.parent(v)

else

u = T.parent(u)

v = T.parent(v)

return lca

LCA(u, v, T)

lca = null

if T.isroot(u) or T.isroot(v) then

lca = T.root

return lca

while (lca = null) do

if (u = v) then

lca = u

return lca

else

if T.depth(u) > T.depth(v) then

u = T.parent(u)

else if T.depth(v) > T.depth(u)

v = T.parent(v)

else

u = T.parent(u)

v = T.parent(v)

return lca

Condition is irrelevant

LCA(u, v, T)

lca = null

if T.isroot(u) or T.isroot(v) then

lca = T.root

return lca

repeat

if (u = v) then

lca = u

return lca

else

if T.depth(u) > T.depth(v) then

u = T.parent(u)

else if T.depth(v) > T.depth(u)

v = T.parent(v)

else

u = T.parent(u)

v = T.parent(v)

lca is no longer used!

LCA(u, v, T)

if T.isroot(u) or T.isroot(v) then

return T.root

repeat

if (u = v) then

return u

else

if T.depth(u) > T.depth(v) then

u = T.parent(u)

else if T.depth(v) > T.depth(u)

v = T.parent(v)

else

u = T.parent(u)

v = T.parent(v)

LCA(u, v, T)

if T.isroot(u) or T.isroot(v) then

return T.root

repeat

if (u = v) then

return u

else

if T.depth(u) > T.depth(v) then

u = T.parent(u)

else if T.depth(v) > T.depth(u)

v = T.parent(v)

else

u = T.parent(u)

v = T.parent(v)

LCA(u, v, T)

while T.depth(u) > T.depth(v)

u = T.parent(u)

while T.depth(v) > T.depth(u)

v = T.parent(v)

if T.isroot(u) or T.isroot(v) then

return T.root

repeat

if (u = v) then

return u

else

u = T.parent(u)

v = T.parent(v)

LCA(u, v, T)

while T.depth(u) > T.depth(v)

u = T.parent(u)

while T.depth(v) > T.depth(u)

v = T.parent(v)

if T.isroot(u) or T.isroot(v) or (u = v) then

return u

repeat

[OOPS!]

else

u = T.parent(u)

v = T.parent(v)

LCA(u, v, T)

while T.depth(u) > T.depth(v)

u = T.parent(u)

while T.depth(v) > T.depth(u)

v = T.parent(v)

if T.isroot(u) or T.isroot(v) or (u = v) then

return u

else return LCA(T.parent(u), T.parent(v), T)

Not needed

LCA(u, v, T)

while T.depth(u) > T.depth(v)

u = T.parent(u)

while T.depth(v) > T.depth(u)

v = T.parent(v)

if T.isroot(u) or (u = v) then

return u

else return LCA(T.parent(u), T.parent(v), T)

LCA(u, v, T)

while T.depth(u) > T.depth(v)

u = T.parent(u)

while T.depth(v) > T.depth(u)

v = T.parent(v)

if (u = v) then

return u

else return LCA(T.parent(u), T.parent(v), T)

Called during recursion, but no effect

LCA(u, v, T)

while T.depth(u) > T.depth(v)

u = T.parent(u)

while T.depth(v) > T.depth(u)

v = T.parent(v)

return LCAsimple(T.parent(u), T.parent(v), T)

LCAsimple(u, v, T)

# LCA for case where u and v have same height

if (u = v) return u

else return LCAsimple(T.parent(u), T.parent(v), T)

DONE!

STACK

• Stack operations:– Push, pop, size, isEmpty()

• (Partial) Implementation: – Array-based stack

ArrayStack

INIT:data = array[20]Count = 0; // next empty space-------------------------------------------------------------Push(obj o): if count < 20 data[count] = o count++ else ERROR(“Overfull Stack”)

ArrayStack

pop():

if count == 0ERROR(“Can’t pop from empty Stack”)

else

count--;

return data[count+1];

ArrayStack

size():

return count

isEmpty()

return count == 0

Analysis

ArrayStack

INIT:data = array[20]Count = 0; // next empty space-------------------------------------------------------------Push(obj o): if count < 20 data[count] = o count++ else ERROR(“Overfull Stack”)

O(n 1)

ArrayStack

pop():

if count == 0ERROR(“Can’t pop from empty Stack”)

else

count--;

return data[count+1];

O(n 1)

ArrayStack

size():

return count

isEmpty()

return count == 0

O(n 1)

O(n 1)

Summary

• Fast but not very useful

ExpandableArrayStack

INIT:

data = array[20]

Count = 0; // next empty space

Capacity = 20

Push

Push(obj o): if count < capacity data[count] = o count++ else

d2 = new Array[capacity+1] for j = 0 to capacity

d2[j] = data[j] capacity = capacity + 1 data = d2 push(o)

Expandable Array Stack

• All other operations remain the same

Analysis

• In the worst case, the time taken is O(n n)

• If we insert items 21, 22, …, 20+k, we’ll have done k operations, with total work 21+22+…+ (20+k) = (20+1) + (20+2) + …(20+k) =20k + (1+2+…+k) = 20k + k(k+1)/2 = O(k k^2)

• So average time is O(k k) as well!

Better: avoid frequent expansion

• Instead of adding a little space, add a lot!

• Double array size when it gets full

DoublingArrayStack: Push

Push(obj o): if count < capacity data[count] = o count++ else

d2 = new Array[2*capacity] for j = 0 to capacity

d2[j] = data[j] capacity = 2*capacity data = d2 push(o)

Doubling Array Stack

• All other operations remain the same

Analysis

Push(obj o): if count < capacity data[count] = o count++ else

d2 = new Array[2*capacity] for j = 0 to capacity

d2[j] = data[j] capacity = 2*capacity data = d2 push(o)

O(n 1)

O(n n)

Analysis

• In the worst case, the time taken is O(n n)

• But over the course of many operations, average time per operation is O(n 1)

“Total Work Analysis”

• If we have an array with n elements

• …and do n operations

• …then total work is no more than 4n.

• Work per operation, on average, is 4.

Alternative view

• “Amortized” analysis:– For each operation that takes one unit of time

• Place an extra unit of time “in the bank”

– By the time an expensive operation arrives• Use your savings to pay for it

• Alternative view: – When you do an expensive operation

• Pay one unit now• Pay an extra unit for each of the next n operations

Language

• For hashing: “the ‘find’ operation runs in expected O(n 1) time”

• For doubling array stacks: “the ‘push’ operation runs in O(n 1) amortized time, with O(n n) worst-case time.”

Pixel boundaries (if time)