Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis.

Hashing, Sets, DictionariesCode Cleaning

Expandable Array Stacks and Amortized Analysis

Hashing so far

To store 250 IP addresses in table:

• Pick prime just bigger than 250 (n = 257)

• Pick a1, …, a4 mod 257 (once and for all)

• To hash x = (x1, …, x4):

– Compute u = a1x1 + … + a4x4 mod 257

– Store x in a bucket at myArray[u]

Generalization

Old: To store 250 IP addresses in table

New: store n1 items, each between 0 and N

Generalization

To store store n1 items between 0 and N• Pick prime n just bigger than n1

• Let k = round_up(logn N)– Each “item” can be written as a k-digit number,

base n

• Pick a1, …, ak mod n (once and for all)• To hash x = (x1, …, xk):

– Compute u = a1x1 + … + akxk mod n– Store x in a bucket at myArray[u]

Example

• Store 8 items, each represented by 16 bits (i.e., between 0 and 216 – 1 = 65535)

• Solution: pick p = 11.

• Log11 65535 = 4.625…, so we pick k = 5

• Pick 5 numbers a1, …, a5, mod 11: 3,10, 0, 5, 2

Example (cont.)• Multipliers: 3, 10, 0, 5, 2• Typical “key”: 31905. • Convert to base 11:

– Mod(31905, 11) = 5– Div(31905, 11) = 2900– Mod(2900, 11) = 7– Div (2900, 11) = 263 …

– 3190511 = 21A75 [“A” means “10”]

• Hash = 3*2 + 10*1 + 0*A + 5*7 + 2*6 mod 11 = 63 mod 11 = 7.

In practice

• Usually items aren’t given as integers between 0 and some large number N

• Doing arithmetic (like “finding the digits”) for big numbers (larger than language can represent) is a pain algorithmically

• Frequently have an “identifier” that’s a few bytes long, often encoded as a string of characters

Practice, cont’d

• Assume objects have k-byte identifiers x

• Compute u = a1x1 + … + akxk mod n

• Put (x, object) into hashbucket u

• This works as long as n > 256 = byte size

• Otherwise assumption of unif. distributed hash indexes is wrong

The SET Abstract Data Type

• create(n): creates a new empty set structure, initially empty but capable of holding up to n elements.

• empty(S): checks whether the set S is empty. • size (S): returns the number of elements in S. • element_of (x,S): checks whether the value x is in the

set S. • enumerate (S): yields the elements of S in some

arbitrary order. • add (S,x): adds the element x to S, if it is not there

already. • delete (S,x): removes the element x from S, if it is

there.

Implementing sets

• Can use hashtable:– “create”, “empty”, and “size” are trivial– “enumerate”: take all elements in all buckets– “add” is just “insert”; “delete” is “delete”– is_element is just “find”

DICTIONARY ADT• Create, empty, size as in SET• Still to do:

– Insert(key, value) – Find(key)

• Sometimes called “store” and “fetch”• A dictionary is sometimes called a “map”

– “key” is ‘mapped to’ “value”

• Closely related to a “database”• May allow several values for one key

– Find(key) returns a list of values in this case

Implementing a dictionary

• Create(n)– Build an array of prime size a little more than

n, each entry an empty list– Pick k numbers, mod n, to handle keys of

length k

• Insert(key, value)– Let u = (a1key1 + … + ak keyk) mod n

– Insert (key, value) into array[u]

• Find(key)– Let u = (a1key1 + … + ak keyk) mod n

– Search for (key, *) in array[u]– If you find (key, val), return val– Else return None

• (Modify as appropriate to return list of vals)

Summary

• We can now assume that we can create a SET or a DICT with O(n1) insertion and lookup times whenever we need one

• After this week’s HW, you can further assume that we don’t need to know the size of the SET or the DICT in advance

Example Application: JUMBLE!

JUMBLE

• Input: list of all 5-letter words in English

• Each word represented as an array of five characters

• Output: all words for which no other permutation is a word

Solution

• Start with an empty dictionary

• Foreach word w– Sort letters alphabetically to get wnew– D.insert(wnew, w)

• Foreach word w– Sort alphabetically again to get wnew

• D(wnew) contains anything except w– Skip w

• Else output w

Clean Your Code

• Errors per line ~ constant– Fewer errors overall!

• Easier to grade– More likely to get credit

• Cleaner code = cleaner thinking– Better understanding of material

LCA(u, v)

lca = null

udepth = T.depth(u)

vdepth = T.depth(v)

if (T.isroot(u) = true) or (T.isroot(v) = true) then

lca = T.root

while (lca = null) do

if (u = v) then

lca = u

if udepth > vdepth then

u = T.parent(u)

udepth = udepth – 1

else if vdepth > udepth

v = T.parent(v)

vdepth = vdepth – 1

u = T.parent(u)

v = T.parent(v)

return lca

LCA(u, v)

lca = null

udepth = T.depth(u)

vdepth = T.depth(v)

lca = T.root

if (u = v) then

lca = u

u = T.parent(u)

v = T.parent(v)

u = T.parent(u)

v = T.parent(v)

return lca

LCA(u, v, T)

lca = null

udepth = T.depth(u)

vdepth = T.depth(v)

lca = T.root

if (u = v) then

lca = u

u = T.parent(u)

v = T.parent(v)

u = T.parent(u)

v = T.parent(v)

return lca

Needlessly complex

LCA(u, v, T)

lca = null

udepth = T.depth(u)

vdepth = T.depth(v)

lca = T.root

if (u = v) then

lca = u

if T.depth(u) > T.depth(v) then

u = T.parent(u)

else if T.depth(v) > T.depth(u)

v = T.parent(v)

u = T.parent(u)

v = T.parent(v)

return lca

Now irrelevant

LCA(u, v, T)

lca = null

lca = T.root

if (u = v) then

lca = u

u = T.parent(u)

v = T.parent(v)

u = T.parent(u)

v = T.parent(v)

return lca

LCA(u, v, T)

lca = null

lca = T.root

if (u = v) then

lca = u

u = T.parent(u)

v = T.parent(v)

u = T.parent(u)

v = T.parent(v)

return lca

Redundant

LCA(u, v, T)

lca = null

if T.isroot(u) or T.isroot(v) then

lca = T.root

if (u = v) then

lca = u

u = T.parent(u)

v = T.parent(v)

u = T.parent(u)

v = T.parent(v)

return lca

LCA(u, v, T)

lca = null

lca = T.root

if (u = v) then

lca = u

u = T.parent(u)

v = T.parent(v)

u = T.parent(u)

v = T.parent(v)

return lca

it’s the answer; return it!

LCA(u, v, T)

lca = null

lca = T.root

return lca

if (u = v) then

lca = u

u = T.parent(u)

v = T.parent(v)

u = T.parent(u)

v = T.parent(v)

return lca

LCA(u, v, T)

lca = null

lca = T.root

return lca

if (u = v) then

lca = u

return lca

u = T.parent(u)

v = T.parent(v)

u = T.parent(u)

v = T.parent(v)

return lca

Condition is irrelevant

LCA(u, v, T)

lca = null

lca = T.root

return lca

repeat

if (u = v) then

lca = u

return lca

u = T.parent(u)

v = T.parent(v)

u = T.parent(u)

v = T.parent(v)

lca is no longer used!

LCA(u, v, T)

return T.root

repeat

if (u = v) then

return u

u = T.parent(u)

v = T.parent(v)

u = T.parent(u)

v = T.parent(v)

LCA(u, v, T)

return T.root

repeat

if (u = v) then

return u

u = T.parent(u)

v = T.parent(v)

u = T.parent(u)

v = T.parent(v)

LCA(u, v, T)

while T.depth(u) > T.depth(v)

u = T.parent(u)

while T.depth(v) > T.depth(u)

v = T.parent(v)

return T.root

repeat

if (u = v) then

return u

u = T.parent(u)

v = T.parent(v)

LCA(u, v, T)

u = T.parent(u)

v = T.parent(v)

if T.isroot(u) or T.isroot(v) or (u = v) then

return u

repeat

[OOPS!]

u = T.parent(u)

v = T.parent(v)

LCA(u, v, T)

u = T.parent(u)

v = T.parent(v)

if T.isroot(u) or T.isroot(v) or (u = v) then

return u

else return LCA(T.parent(u), T.parent(v), T)

Not needed

LCA(u, v, T)

u = T.parent(u)

v = T.parent(v)

if T.isroot(u) or (u = v) then

return u

LCA(u, v, T)

u = T.parent(u)

v = T.parent(v)

if (u = v) then

return u

Called during recursion, but no effect

LCA(u, v, T)

u = T.parent(u)

v = T.parent(v)

return LCAsimple(T.parent(u), T.parent(v), T)

LCAsimple(u, v, T)

# LCA for case where u and v have same height

if (u = v) return u

else return LCAsimple(T.parent(u), T.parent(v), T)

• Stack operations:– Push, pop, size, isEmpty()

• (Partial) Implementation: – Array-based stack

ArrayStack

INIT:data = array[20]Count = 0; // next empty space-------------------------------------------------------------Push(obj o): if count < 20 data[count] = o count++ else ERROR(“Overfull Stack”)

ArrayStack

pop():

if count == 0ERROR(“Can’t pop from empty Stack”)

count--;

return data[count+1];

ArrayStack

size():

return count

isEmpty()

return count == 0

Analysis

ArrayStack

INIT:data = array[20]Count = 0; // next empty space-------------------------------------------------------------Push(obj o): if count < 20 data[count] = o count++ else ERROR(“Overfull Stack”)

O(n 1)

ArrayStack

pop():

if count == 0ERROR(“Can’t pop from empty Stack”)

count--;

return data[count+1];

O(n 1)

ArrayStack

size():

return count

isEmpty()

return count == 0

O(n 1)

Summary

• Fast but not very useful

ExpandableArrayStack

data = array[20]

Count = 0; // next empty space

Capacity = 20

Push(obj o): if count < capacity data[count] = o count++ else

d2 = new Array[capacity+1] for j = 0 to capacity

d2[j] = data[j] capacity = capacity + 1 data = d2 push(o)

Expandable Array Stack

• All other operations remain the same

Analysis

• In the worst case, the time taken is O(n n)

• If we insert items 21, 22, …, 20+k, we’ll have done k operations, with total work 21+22+…+ (20+k) = (20+1) + (20+2) + …(20+k) =20k + (1+2+…+k) = 20k + k(k+1)/2 = O(k k^2)

• So average time is O(k k) as well!

Better: avoid frequent expansion

• Instead of adding a little space, add a lot!

• Double array size when it gets full

DoublingArrayStack: Push

d2 = new Array[2*capacity] for j = 0 to capacity

d2[j] = data[j] capacity = 2*capacity data = d2 push(o)

Doubling Array Stack

• All other operations remain the same

Analysis

d2 = new Array[2*capacity] for j = 0 to capacity

d2[j] = data[j] capacity = 2*capacity data = d2 push(o)

O(n 1)

O(n n)

Analysis

• In the worst case, the time taken is O(n n)

• But over the course of many operations, average time per operation is O(n 1)

“Total Work Analysis”

• If we have an array with n elements

• …and do n operations

• …then total work is no more than 4n.

• Work per operation, on average, is 4.

Alternative view

• “Amortized” analysis:– For each operation that takes one unit of time

• Place an extra unit of time “in the bank”

– By the time an expensive operation arrives• Use your savings to pay for it

• Alternative view: – When you do an expensive operation

• Pay one unit now• Pay an extra unit for each of the next n operations

Language

• For hashing: “the ‘find’ operation runs in expected O(n 1) time”

• For doubling array stacks: “the ‘push’ operation runs in O(n 1) amortized time, with O(n n) worst-case time.”

Pixel boundaries (if time)

Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis.

Documents

Transcript of Hashing, Sets, Dictionaries Code Cleaning Expandable Array Stacks and Amortized Analysis.

Expandable Polymeric Beads - UK P&I · Expandable Polymeric Beads Expandable polymeric beads, also known as expandable polystyrene (EPS) (UN 2211 Polymeric Beads, Expandable), are

Amortized Analysis of Algorithm s

binary heap, d-ary heap, binomial heap, amortized analysis ... · Amortized Complexity [amortizovaná složitost] In an amortized analysis , the time required to perform a sequence

Amortized Analysis and Union-Find - WordPress.com · 2011-02-28 · Amortized Complexity of Quick-Find • Amortized analysis: • Each time x’s parent pointer changes the size

Python Dictionaries - Open.Michigan · Dictionaries! • Dictionaries are Pythonʼs most powerful data collection! • Dictionaries allow us to do fast database-like operations in

Amortized analysis - cs.bgu.ac.ilds202/wiki.files/13-amortized-analysis.pdf · The amortized time complexity of an operation is (1). 4 $1 push(S,5) 5 $1 Amortized analysis. Stack

Amortized Analysis - Columbia University

Amortized analysis - BGUds162/wiki.files/15-amortized-analysis.pdf · Amortized analysis Inamortized analysisthe goal is to bound the worst case time of asequence of operationson

Amortized and Master method.pdf

Algo10.Amortized Analysis

Dynamic Set: Amortized Analysis

Amortized Analysis - The binary counter

Bible Dictionaries and Encyclopedias. General Reference General Reference Bible Encyclopedias Bible Encyclopedias Bible Dictionaries Bible Dictionaries.

CS624 - Analysis of Algorithms...Amortized Analysis Amortized analysis is a technique for analyzing the e ciency of an algorithm. An amortized analysis is any strategy for analyzing

พันธบัตรรัฐบาลประเภททยอยชำระคืนเงินต้น (Amortized Bodn : LBA)

Amortized Loans

Amortized Threshold Symmetric-key Encryption

Amortized Bond Domestic RoadshowInvestor Presentation

Dictionaries - web.stanford.eduweb.stanford.edu/class/cs106a/lectures/16-Dictionaries/16-Dictionari… · Dictionaries in Python •Creating dictionaries –Dictionary start/end with

Amortized Splay Trees