Hashing Rehashed Paul M. Dorfman, Independent Consultant

36
Hashing Rehashed Paul M. Dorfman, Independent Consultant Gregg P. Snell, Data Savant Consulting SUGI 27, Paper 12 Orlando, FL

description

Hashing Rehashed Paul M. Dorfman, Independent Consultant Gregg P. Snell, Data Savant Consulting. SUGI 27, Paper 12 Orlando, FL. MYLIB.SUVBYZIP ZIPCITYSUV 66216Shawnee67 66216Shawnee-Mission67 32258Jacksonville88 27513Cary 214. ARRAY SUV(0:99999) SUV(00000)=. - PowerPoint PPT Presentation

Transcript of Hashing Rehashed Paul M. Dorfman, Independent Consultant

Page 1: Hashing Rehashed Paul M. Dorfman, Independent Consultant

Hashing RehashedPaul M. Dorfman, Independent Consultant

Gregg P. Snell, Data Savant Consulting

SUGI 27, Paper 12

Orlando, FL

Page 2: Hashing Rehashed Paul M. Dorfman, Independent Consultant

KEY- INDEXING

MYLIB.SUVBYZIP

ZIP CITY SUV

66216 Shawnee 67

66216 Shawnee-Mission 67

32258 Jacksonville 88

27513 Cary 214

ARRAY SUV(0:99999)

SUV(00000)=.

SUV(00001)=.

SUV(27513)=.

SUV(32258)=.

SUV(66216)=.

SUV(99999)=.

data TESTLIB.CUSTOMER;

** load suv counts into the array;

array suv(0:99999) _temporary_;

do until(eof1); /* source not sorted */

set MYLIB.SUVBYZIP(keep=zipcode suv_count) end=eof1;

if suv(zipcode) = . then

suv(zipcode) = suv_count;

end;

Hashing Rehashed SUGI 27, Paper 12 Dorfman and Snell

Page 3: Hashing Rehashed Paul M. Dorfman, Independent Consultant

MYLIB.SUVBYZIP

ZIP CITY SUV

66216 Shawnee 67

66216 Shawnee-Mission 67

32258 Jacksonville 88

27513 Cary 214

ARRAY SUV(0:99999)

SUV(00000)=.

SUV(00001)=.

SUV(27513)=.

SUV(32258)=.

SUV(66216)=67

SUV(99999)=.

data TESTLIB.CUSTOMER;

** load suv counts into the array;

array suv(0:99999) _temporary_;

do until(eof1); /* source not sorted */

set MYLIB.SUVBYZIP(keep=zipcode suv_count) end=eof1;

if suv(zipcode) = . then

suv(zipcode) = suv_count;

end;

KEY- INDEXING

Hashing Rehashed SUGI 27, Paper 12 Dorfman and Snell

Page 4: Hashing Rehashed Paul M. Dorfman, Independent Consultant

MYLIB.SUVBYZIP

ZIP CITY SUV

66216 Shawnee 67

66216 Shawnee-Mission 67

32258 Jacksonville 88

27513 Cary 214

ARRAY SUV(0:99999)

SUV(00000)=.

SUV(00001)=.

SUV(27513)=.

SUV(32258)=.

SUV(66216)=67

SUV(99999)=.

data TESTLIB.CUSTOMER;

** load suv counts into the array;

array suv(0:99999) _temporary_;

do until(eof1); /* source not sorted */

set MYLIB.SUVBYZIP(keep=zipcode suv_count) end=eof1;

if suv(zipcode) = . then

suv(zipcode) = suv_count;

end;

KEY- INDEXING

Hashing Rehashed SUGI 27, Paper 12 Dorfman and Snell

Page 5: Hashing Rehashed Paul M. Dorfman, Independent Consultant

MYLIB.SUVBYZIP

ZIP CITY SUV

66216 Shawnee 67

66216 Shawnee-Mission 67

32258 Jacksonville 88

27513 Cary 214

ARRAY SUV(0:99999)

SUV(00000)=.

SUV(00001)=.

SUV(27513)=.

SUV(32258)=.

SUV(66216)=67

SUV(99999)=.

data TESTLIB.CUSTOMER;

** load suv counts into the array;

array suv(0:99999) _temporary_;

do until(eof1); /* source not sorted */

set MYLIB.SUVBYZIP(keep=zipcode suv_count) end=eof1;

if suv(zipcode) = . then

suv(zipcode) = suv_count;

end;

KEY- INDEXING

Hashing Rehashed SUGI 27, Paper 12 Dorfman and Snell

Page 6: Hashing Rehashed Paul M. Dorfman, Independent Consultant

MYLIB.SUVBYZIP

ZIP CITY SUV

66216 Shawnee 67

66216 Shawnee-Mission 67

32258 Jacksonville 88

27513 Cary 214

ARRAY SUV(0:99999)

SUV(00000)=.

SUV(00001)=.

SUV(27513)=.

SUV(32258)=88

SUV(66216)=67

SUV(99999)=.

data TESTLIB.CUSTOMER;

** load suv counts into the array;

array suv(0:99999) _temporary_;

do until(eof1); /* source not sorted */

set MYLIB.SUVBYZIP(keep=zipcode suv_count) end=eof1;

if suv(zipcode) = . then

suv(zipcode) = suv_count;

end;

KEY- INDEXING

Hashing Rehashed SUGI 27, Paper 12 Dorfman and Snell

Page 7: Hashing Rehashed Paul M. Dorfman, Independent Consultant

MYLIB.SUVBYZIP

ZIP CITY SUV

66216 Shawnee 67

66216 Shawnee-Mission 67

32258 Jacksonville 88

27513 Cary 214

ARRAY SUV(0:99999)

SUV(00000)=.

SUV(00001)=.

SUV(27513)=.

SUV(32258)=88

SUV(66216)=67

SUV(99999)=.

data TESTLIB.CUSTOMER;

** load suv counts into the array;

array suv(0:99999) _temporary_;

do until(eof1); /* source not sorted */

set MYLIB.SUVBYZIP(keep=zipcode suv_count) end=eof1;

if suv(zipcode) = . then

suv(zipcode) = suv_count;

end;

KEY- INDEXING

Hashing Rehashed SUGI 27, Paper 12 Dorfman and Snell

Page 8: Hashing Rehashed Paul M. Dorfman, Independent Consultant

MYLIB.SUVBYZIP

ZIP CITY SUV

66216 Shawnee 67

66216 Shawnee-Mission 67

32258 Jacksonville 88

27513 Cary 214

ARRAY SUV(0:99999)

SUV(00000)=.

SUV(00001)=.

SUV(27513)=214

SUV(32258)=88

SUV(66216)=67

SUV(99999)=.

data TESTLIB.CUSTOMER;

** load suv counts into the array;

array suv(0:99999) _temporary_;

do until(eof1); /* source not sorted */

set MYLIB.SUVBYZIP(keep=zipcode suv_count) end=eof1;

if suv(zipcode) = . then

suv(zipcode) = suv_count;

end;

KEY- INDEXING

Hashing Rehashed SUGI 27, Paper 12 Dorfman and Snell

Page 9: Hashing Rehashed Paul M. Dorfman, Independent Consultant

ARRAY SUV(0:99999)

SUV(00000)=.

SUV(00001)=.

SUV(27513)=214

SUV(32258)=88

SUV(66216)=67

SUV(99999)=.

MYLIB.SUVBYZIP

ZIP CITY SUV

66216 Shawnee 67

66216 Shawnee-Mission 67

32258 Jacksonville 88

27513 Cary 214

data TESTLIB.CUSTOMER;

** load suv counts into the array;

array suv(0:99999) _temporary_;

do until(eof1); /* source not sorted */

set MYLIB.SUVBYZIP(keep=zipcode suv_count) end=eof1;

if suv(zipcode) = . then

suv(zipcode) = suv_count;

end;

KEY- INDEXING

Hashing Rehashed SUGI 27, Paper 12 Dorfman and Snell

Page 10: Hashing Rehashed Paul M. Dorfman, Independent Consultant

KEY- INDEXING

data TESTLIB.CUSTOMER;

** load suv counts into the array;

array suv(0:99999) _temporary_;

do until(eof1); /* source not sorted */

set MYLIB.SUVBYZIP(keep=zipcode suv_count) end=eof1;

if suv(zipcode) = . then

suv(zipcode) = suv_count;

end;

** add suv_count to master data set;

do until(eof2);

set PRODLIB.CUSTOMER end=eof2;

/* assign count by directly

addressing the array*/

suv_count=suv(zip5);

/* be sure to drop unwanted fields

introduced during table load */

drop zipcode;

output;

end;

run;

Hashing Rehashed SUGI 27, Paper 12 Dorfman and Snell

Page 11: Hashing Rehashed Paul M. Dorfman, Independent Consultant
Page 12: Hashing Rehashed Paul M. Dorfman, Independent Consultant

Hash

Hash

Hash

Hash

Hash

Hash

COLLISION RESOLUTION POLICY: LINEAR PROBING

Hashing Rehashed SUGI 27, Paper 12 Dorfman and Snell

KEY HASH_ADDR HASH_TABLE

185 04 (00)=.

971 10 (01)=260

400 11 (02)=.

260 01 (03)=.

922 13 (04)=185

970 09 (05)=.

543 11 (06)=.

(07)=.

(08)=.

(09)=970

(10)=971

(11)=400

(12)=.

(13)=922

 

 

Page 13: Hashing Rehashed Paul M. Dorfman, Independent Consultant

COLLISION

KEY HASH_ADDR HASH_TABLE

185 04 (00)=.

971 10 (01)=260

400 11 (02)=.

260 01 (03)=.

922 13 (04)=185

970 09 (05)=.

543 11 (06)=.

(07)=.

(08)=.

(09)=970

(10)=971

(11)=400

(12)=.

(13)=922

 

 

Hash

Hashing Rehashed SUGI 27, Paper 12 Dorfman and Snell

COLLISION RESOLUTION POLICY: LINEAR PROBING

data _null_;

array hash_table(0:&hash_size) _temporary_;

do until(eof1);

set small end=eof1;

do hash_addr=mod(key,&hash_size)+1

by -1 until

(hash_table(hash_addr)=. or

hash_table(hash_addr)=key);

if hash_addr < 0 then

hash_addr=&hash_size-1;

end;

hash_table(hash_addr) = key;

end;

/* all done, write results to the log */

put 'hash_table';

do i=0 to &hash_size;

put '(' i z2.')=' hash_table(i) z3.;

end;

run;

Page 14: Hashing Rehashed Paul M. Dorfman, Independent Consultant

COLLISION

data _null_;

array hash_table(0:&hash_size) _temporary_;

do until(eof1);

set small end=eof1;

do hash_addr=mod(key,&hash_size)+1

by -1 until

(hash_table(hash_addr)=. or

hash_table(hash_addr)=key);

if hash_addr < 0 then

hash_addr=&hash_size-1;

end;

hash_table(hash_addr) = key;

end;

/* all done, write results to the log */

put 'hash_table';

do i=0 to &hash_size;

put '(' i z2.')=' hash_table(i) z3.;

end;

run;

Hashing Rehashed SUGI 27, Paper 12 Dorfman and Snell

COLLISION RESOLUTION POLICY: LINEAR PROBING

KEY HASH_ADDR HASH_TABLE

185 04 (00)=.

971 10 (01)=260

400 11 (02)=.

260 01 (03)=.

922 13 (04)=185

970 09 (05)=.

543 11 (06)=.

(07)=.

(08)=543

(09)=970

(10)=971

(11)=400

(12)=.

(13)=922

 

 

Page 15: Hashing Rehashed Paul M. Dorfman, Independent Consultant
Page 16: Hashing Rehashed Paul M. Dorfman, Independent Consultant

Hash

Hash

Hash

Hash

Hash

COLLISION RESOLUTION POLICY: COALESCED CHAINING

Hashing Rehashed SUGI 27, Paper 12 Dorfman and Snell

KEY HASH_ADDR LINK_TO_NEXT HASH_TABLE

185 04 (00)=. (00)=.

971 10 (01)=0 (01)=260

400 11 (02)=. (02)=.

260 01 (03)=. (03)=.

922 13 (04)=0 (04)=185

970 09 (05)=. (05)=.

543 11 (06)=. (06)=.

(07)=. (07)=.

(08)=. (08)=.

(09)=. (09)=.

(10)=0 (10)=971

(11)=0 (11)=400

(12)=. (12)=.

(13)=0 (13)=922

 

 

Page 17: Hashing Rehashed Paul M. Dorfman, Independent Consultant

Step 2Step 1

/* STEP 1 (hash the key) */

hash_addr = mod(key,&hash_size)+1;

/* STEP 2 (collision?) */

if link_to_next(hash_addr) >. then do;

end;

/* STEP 3 (mark link as occupied) */

link_to_next(hash_addr) = 0;

/* STEP 4 (store the key) */

hash_table(hash_addr) = key;

Hashing Rehashed SUGI 27, Paper 12 Dorfman and Snell

COLLISION RESOLUTION POLICY: COALESCED CHAINING

KEY HASH_ADDR LINK_TO_NEXT HASH_TABLE

185 04 (00)=. (00)=.

971 10 (01)=0 (01)=260

400 11 (02)=. (02)=.

260 01 (03)=. (03)=.

922 13 (04)=0 (04)=185

970 09 (05)=. (05)=.

543 11 (06)=. (06)=.

(07)=. (07)=.

(08)=. (08)=.

(09)=. (09)=.

(10)=0 (10)=971

(11)=0 (11)=400

(12)=. (12)=.

(13)=0 (13)=922

 

 

Page 18: Hashing Rehashed Paul M. Dorfman, Independent Consultant

Step 3

Step 1Step 2

Hashing Rehashed SUGI 27, Paper 12 Dorfman and Snell

COLLISION RESOLUTION POLICY: COALESCED CHAINING

/* STEP 1 (hash the key) */

hash_addr = mod(key,&hash_size)+1;

/* STEP 2 (collision?) */

if link_to_next(hash_addr) >. then do;

end;

/* STEP 3 (mark link as occupied) */

link_to_next(hash_addr) = 0;

/* STEP 4 (store the key) */

hash_table(hash_addr) = key;

KEY HASH_ADDR LINK_TO_NEXT HASH_TABLE

185 04 (00)=. (00)=.

971 10 (01)=0 (01)=260

400 11 (02)=. (02)=.

260 01 (03)=. (03)=.

922 13 (04)=0 (04)=185

970 09 (05)=. (05)=.

543 11 (06)=. (06)=.

(07)=. (07)=.

(08)=. (08)=.

(09)=0 (09)=.

(10)=0 (10)=971

(11)=0 (11)=400

(12)=. (12)=.

(13)=0 (13)=922

 

 

Page 19: Hashing Rehashed Paul M. Dorfman, Independent Consultant

Step 3

Step 1Step 2

Step 4

/* STEP 1 (hash the key) */

hash_addr = mod(key,&hash_size)+1;

/* STEP 2 (collision?) */

if link_to_next(hash_addr) >. then do;

end;

/* STEP 3 (mark link as occupied) */

link_to_next(hash_addr) = 0;

/* STEP 4 (store the key) */

hash_table(hash_addr) = key;

KEY HASH_ADDR LINK_TO_NEXT HASH_TABLE

185 04 (00)=. (00)=.

971 10 (01)=0 (01)=260

400 11 (02)=. (02)=.

260 01 (03)=. (03)=.

922 13 (04)=0 (04)=185

970 09 (05)=. (05)=.

543 11 (06)=. (06)=.

(07)=. (07)=.

(08)=. (08)=.

(09)=0 (09)=970

(10)=0 (10)=971

(11)=0 (11)=400

(12)=. (12)=.

(13)=0 (13)=922

 

 

Hashing Rehashed SUGI 27, Paper 12 Dorfman and Snell

COLLISION RESOLUTION POLICY: COALESCED CHAINING

Page 20: Hashing Rehashed Paul M. Dorfman, Independent Consultant

Step 1

COLLISION

Step 2

/* STEP 1 (hash the key) */

hash_addr = mod(key,&hash_size)+1;

/* STEP 2 (collision?) */

if link_to_next(hash_addr) >. then do;

end;

/* STEP 3 (mark link as occupied) */

link_to_next(hash_addr) = 0;

/* STEP 4 (store the key) */

hash_table(hash_addr) = key;

KEY HASH_ADDR LINK_TO_NEXT HASH_TABLE

185 04 (00)=. (00)=.

971 10 (01)=0 (01)=260

400 11 (02)=. (02)=.

260 01 (03)=. (03)=.

922 13 (04)=0 (04)=185

970 09 (05)=. (05)=.

543 11 (06)=. (06)=.

(07)=. (07)=.

(08)=. (08)=.

(09)=0 (09)=970

(10)=0 (10)=971

(11)=0 (11)=400

(12)=. (12)=.

(13)=0 (13)=922

 

 

Hashing Rehashed SUGI 27, Paper 12 Dorfman and Snell

COLLISION RESOLUTION POLICY: COALESCED CHAINING

Page 21: Hashing Rehashed Paul M. Dorfman, Independent Consultant

COLLISION

/* STEP 2.a (check for duplicates) */ traverse:

if key = hash_table(hash_addr)

then found=1;

else if link_to_next(hash_addr) ne 0

then do;

hash_addr=link_to_next(hash_addr);

goto traverse;

end;

Step 1

Step 2a

0 ?

KEY HASH_ADDR LINK_TO_NEXT HASH_TABLE

185 04 (00)=. (00)=.

971 10 (01)=0 (01)=260

400 11 (02)=. (02)=.

260 01 (03)=. (03)=.

922 13 (04)=0 (04)=185

970 09 (05)=. (05)=.

543 11 (06)=. (06)=.

(07)=. (07)=.

(08)=. (08)=.

(09)=0 (09)=970

(10)=0 (10)=971

(11)=0 (11)=400

(12)=. (12)=.

(13)=0 (13)=922

 

 

Hashing Rehashed SUGI 27, Paper 12 Dorfman and Snell

COLLISION RESOLUTION POLICY: COALESCED CHAINING

Step 2

Page 22: Hashing Rehashed Paul M. Dorfman, Independent Consultant

COLLISION

/* STEP 2.b (follow next_key to empty node) */

do next_key = &hash_size by -1

until(link_to_next(next_key)=.);

end;

/* STEP 2.c (change original link from 0 to next */

link_to_next(hash_addr)=next_key;

hash_addr = next_key;

Step 1

Step 2a

Step 2b

Step 2c

KEY HASH_ADDR LINK_TO_NEXT HASH_TABLE

185 04 (00)=. (00)=.

971 10 (01)=0 (01)=260

400 11 (02)=. (02)=.

260 01 (03)=. (03)=.

922 13 (04)=0 (04)=185

970 09 (05)=. (05)=.

543 11 (06)=. (06)=.

(07)=. (07)=.

(08)=. (08)=.

(09)=0 (09)=970

(10)=0 (10)=971

(11)=0 (11)=400

(12)=. (12)=.

(13)=0 (13)=922

 

 

Hashing Rehashed SUGI 27, Paper 12 Dorfman and Snell

COLLISION RESOLUTION POLICY: COALESCED CHAINING

Step 2

Page 23: Hashing Rehashed Paul M. Dorfman, Independent Consultant

Step 1

Step 2c

Step 2a

Step 2b

/* STEP 2.b (follow next_key to empty node) */

do next_key = &hash_size by -1

until(link_to_next(next_key)=.);

end;

/* STEP 2.c (change original link from 0 to next */

link_to_next(hash_addr)=next_key;

hash_addr = next_key;

Hashing Rehashed SUGI 27, Paper 12 Dorfman and Snell

COLLISION RESOLUTION POLICY: COALESCED CHAINING

Step 2

KEY HASH_ADDR LINK_TO_NEXT HASH_TABLE

185 04 (00)=. (00)=.

971 10 (01)=0 (01)=260

400 11 (02)=. (02)=.

260 01 (03)=. (03)=.

922 13 (04)=0 (04)=185

970 09 (05)=. (05)=.

543 11 (06)=. (06)=.

(07)=. (07)=.

(08)=. (08)=.

(09)=0 (09)=970

(10)=0 (10)=971

(11)=12 (11)=400

(12)=. (12)=.

(13)=0 (13)=922

 

 

Page 24: Hashing Rehashed Paul M. Dorfman, Independent Consultant

Step 3

Step 1

Step 2c

Step 2a

Step 2b

/* STEP 1 (hash the key) */

hash_addr = mod(key,&hash_size)+1;

/* STEP 2 (collision?) */

if link_to_next(hash_addr) >. then do;

end;

/* STEP 3 (mark link as occupied) */

link_to_next(hash_addr) = 0;

/* STEP 4 (store the key) */

hash_table(hash_addr) = key;

KEY HASH_ADDR LINK_TO_NEXT HASH_TABLE

185 04 (00)=. (00)=.

971 10 (01)=0 (01)=260

400 11 (02)=. (02)=.

260 01 (03)=. (03)=.

922 13 (04)=0 (04)=185

970 09 (05)=. (05)=.

543 11 (06)=. (06)=.

(07)=. (07)=.

(08)=. (08)=.

(09)=0 (09)=970

(10)=0 (10)=971

(11)=12 (11)=400

(12)=. (12)=.

(13)=0 (13)=922

 

 

Hashing Rehashed SUGI 27, Paper 12 Dorfman and Snell

COLLISION RESOLUTION POLICY: COALESCED CHAINING

Step 2

Page 25: Hashing Rehashed Paul M. Dorfman, Independent Consultant

Step 1

Step 2c

Step 3

Step 2b

Step 2a

/* STEP 1 (hash the key) */

hash_addr = mod(key,&hash_size)+1;

/* STEP 2 (collision?) */

if link_to_next(hash_addr) >. then do;

end;

/* STEP 3 (mark link as occupied) */

link_to_next(hash_addr) = 0;

/* STEP 4 (store the key) */

hash_table(hash_addr) = key;

KEY HASH_ADDR LINK_TO_NEXT HASH_TABLE

185 04 (00)=. (00)=.

971 10 (01)=0 (01)=260

400 11 (02)=. (02)=.

260 01 (03)=. (03)=.

922 13 (04)=0 (04)=185

970 09 (05)=. (05)=.

543 11 (06)=. (06)=.

(07)=. (07)=.

(08)=. (08)=.

(09)=0 (09)=970

(10)=0 (10)=971

(11)=12 (11)=400

(12)=0 (12)=.

(13)=0 (13)=922

 

 

Hashing Rehashed SUGI 27, Paper 12 Dorfman and Snell

COLLISION RESOLUTION POLICY: COALESCED CHAINING

Step 2

Page 26: Hashing Rehashed Paul M. Dorfman, Independent Consultant

Step 1

Step 2c

Step 3

/* STEP 1 (hash the key) */

hash_addr = mod(key,&hash_size)+1;

/* STEP 2 (collision?) */

if link_to_next(hash_addr) >. then do;

end;

/* STEP 3 (mark link as occupied) */

link_to_next(hash_addr) = 0;

/* STEP 4 (store the key) */

hash_table(hash_addr) = key;Step 2b

Step 2a

Step 4

KEY HASH_ADDR LINK_TO_NEXT HASH_TABLE

185 04 (00)=. (00)=.

971 10 (01)=0 (01)=260

400 11 (02)=. (02)=.

260 01 (03)=. (03)=.

922 13 (04)=0 (04)=185

970 09 (05)=. (05)=.

543 11 (06)=. (06)=.

(07)=. (07)=.

(08)=. (08)=.

(09)=0 (09)=970

(10)=0 (10)=971

(11)=12 (11)=400

(12)=0 (12)=543

(13)=0 (13)=922

 

 

Hashing Rehashed SUGI 27, Paper 12 Dorfman and Snell

COLLISION RESOLUTION POLICY: COALESCED CHAINING

Step 2

Page 27: Hashing Rehashed Paul M. Dorfman, Independent Consultant

COLLISION

Step 1Step 2

/* STEP 1 (hash the key) */

hash_addr = mod(key,&hash_size)+1;

/* STEP 2 (collision?) */

if link_to_next(hash_addr) >. then do;

end;

/* STEP 3 (mark link as occupied) */

link_to_next(hash_addr) = 0;

/* STEP 4 (store the key) */

hash_table(hash_addr) = key;

KEY HASH_ADDR LINK_TO_NEXT HASH_TABLE

185 04 (00)=. (00)=.

971 10 (01)=0 (01)=260

400 11 (02)=. (02)=.

260 01 (03)=. (03)=.

922 13 (04)=0 (04)=185

970 09 (05)=. (05)=.

543 11 (06)=. (06)=.

532 13 (07)=. (07)=.

(08)=. (08)=.

(09)=0 (09)=970

(10)=0 (10)=971

(11)=12 (11)=400

(12)=0 (12)=543

(13)=0 (13)=922

 

 

Hashing Rehashed SUGI 27, Paper 12 Dorfman and Snell

COLLISION RESOLUTION POLICY: COALESCED CHAINING

Page 28: Hashing Rehashed Paul M. Dorfman, Independent Consultant

/* STEP 2.a (check for duplicates) */ traverse:

if key = hash_table(hash_addr)

then found=1;

else if link_to_next(hash_addr) ne 0

then do;

hash_addr=link_to_next(hash_addr);

goto traverse;

end;

COLLISION

Step 1Step 2

Step 2a

0 ?

KEY HASH_ADDR LINK_TO_NEXT HASH_TABLE

185 04 (00)=. (00)=.

971 10 (01)=0 (01)=260

400 11 (02)=. (02)=.

260 01 (03)=. (03)=.

922 13 (04)=0 (04)=185

970 09 (05)=. (05)=.

543 11 (06)=. (06)=.

532 13 (07)=. (07)=.

(08)=. (08)=.

(09)=0 (09)=970

(10)=0 (10)=971

(11)=12 (11)=400

(12)=0 (12)=543

(13)=0 (13)=922

 

 

Hashing Rehashed SUGI 27, Paper 12 Dorfman and Snell

COLLISION RESOLUTION POLICY: COALESCED CHAINING

Page 29: Hashing Rehashed Paul M. Dorfman, Independent Consultant

COLLISION

Step 1Step 2

Step 2bStep 2a

Step 2c

/* STEP 2.b (follow next_key to empty node) */

do next_key = &hash_size by -1

until(link_to_next(next_key)=.);

end;

/* STEP 2.c (change original link from 0 to next */

link_to_next(hash_addr)=next_key;

hash_addr = next_key;

KEY HASH_ADDR LINK_TO_NEXT HASH_TABLE

185 04 (00)=. (00)=.

971 10 (01)=0 (01)=260

400 11 (02)=. (02)=.

260 01 (03)=. (03)=.

922 13 (04)=0 (04)=185

970 09 (05)=. (05)=.

543 11 (06)=. (06)=.

532 13 (07)=. (07)=.

(08)=. (08)=.

(09)=0 (09)=970

(10)=0 (10)=971

(11)=12 (11)=400

(12)=0 (12)=543

(13)=0 (13)=922

 

 

Hashing Rehashed SUGI 27, Paper 12 Dorfman and Snell

COLLISION RESOLUTION POLICY: COALESCED CHAINING

Page 30: Hashing Rehashed Paul M. Dorfman, Independent Consultant

Step 1

Step 2c

/* STEP 2.b (follow next_key to empty node) */

do next_key = &hash_size by -1

until(link_to_next(next_key)=.);

end;

/* STEP 2.c (change original link from 0 to next */

link_to_next(hash_addr)=next_key;

hash_addr = next_key;

Step 2

Step 2bStep 2a

KEY HASH_ADDR LINK_TO_NEXT HASH_TABLE

185 04 (00)=. (00)=.

971 10 (01)=0 (01)=260

400 11 (02)=. (02)=.

260 01 (03)=. (03)=.

922 13 (04)=0 (04)=185

970 09 (05)=. (05)=.

543 11 (06)=. (06)=.

532 13 (07)=. (07)=.

(08)=. (08)=.

(09)=0 (09)=970

(10)=0 (10)=971

(11)=12 (11)=400

(12)=0 (12)=543

(13)=8 (13)=922

 

 

Hashing Rehashed SUGI 27, Paper 12 Dorfman and Snell

COLLISION RESOLUTION POLICY: COALESCED CHAINING

Page 31: Hashing Rehashed Paul M. Dorfman, Independent Consultant

Step 1 Step 3

Step 2c

Step 2

Step 2bStep 2a

/* STEP 1 (hash the key) */

hash_addr = mod(key,&hash_size)+1;

/* STEP 2 (collision?) */

if link_to_next(hash_addr) >. then do;

end;

/* STEP 3 (mark link as occupied) */

link_to_next(hash_addr) = 0;

/* STEP 4 (store the key) */

hash_table(hash_addr) = key;

KEY HASH_ADDR LINK_TO_NEXT HASH_TABLE

185 04 (00)=. (00)=.

971 10 (01)=0 (01)=260

400 11 (02)=. (02)=.

260 01 (03)=. (03)=.

922 13 (04)=0 (04)=185

970 09 (05)=. (05)=.

543 11 (06)=. (06)=.

532 13 (07)=. (07)=.

(08)=. (08)=.

(09)=0 (09)=970

(10)=0 (10)=971

(11)=12 (11)=400

(12)=0 (12)=543

(13)=8 (13)=922

 

 

Hashing Rehashed SUGI 27, Paper 12 Dorfman and Snell

COLLISION RESOLUTION POLICY: COALESCED CHAINING

Page 32: Hashing Rehashed Paul M. Dorfman, Independent Consultant

Step 1 Step 3

Step 2c

Step 2

Step 2bStep 2a

/* STEP 1 (hash the key) */

hash_addr = mod(key,&hash_size)+1;

/* STEP 2 (collision?) */

if link_to_next(hash_addr) >. then do;

end;

/* STEP 3 (mark link as occupied) */

link_to_next(hash_addr) = 0;

/* STEP 4 (store the key) */

hash_table(hash_addr) = key;

KEY HASH_ADDR LINK_TO_NEXT HASH_TABLE

185 04 (00)=. (00)=.

971 10 (01)=0 (01)=260

400 11 (02)=. (02)=.

260 01 (03)=. (03)=.

922 13 (04)=0 (04)=185

970 09 (05)=. (05)=.

543 11 (06)=. (06)=.

532 13 (07)=. (07)=.

(08)=0 (08)=.

(09)=0 (09)=970

(10)=0 (10)=971

(11)=12 (11)=400

(12)=0 (12)=543

(13)=8 (13)=922

 

 

Hashing Rehashed SUGI 27, Paper 12 Dorfman and Snell

COLLISION RESOLUTION POLICY: COALESCED CHAINING

Page 33: Hashing Rehashed Paul M. Dorfman, Independent Consultant

Step 1 Step 3

Step 2c

Step 2

Step 2bStep 2a

/* STEP 1 (hash the key) */

hash_addr = mod(key,&hash_size)+1;

/* STEP 2 (collision?) */

if link_to_next(hash_addr) >. then do;

end;

/* STEP 3 (mark link as occupied) */

link_to_next(hash_addr) = 0;

/* STEP 4 (store the key) */

hash_table(hash_addr) = key;

KEY HASH_ADDR LINK_TO_NEXT HASH_TABLE

185 04 (00)=. (00)=.

971 10 (01)=0 (01)=260

400 11 (02)=. (02)=.

260 01 (03)=. (03)=.

922 13 (04)=0 (04)=185

970 09 (05)=. (05)=.

543 11 (06)=. (06)=.

532 13 (07)=. (07)=.

(08)=0 (08)=532

(09)=0 (09)=970

(10)=0 (10)=971

(11)=12 (11)=400

(12)=0 (12)=543

(13)=8 (13)=922

 

 

Hashing Rehashed SUGI 27, Paper 12 Dorfman and Snell

COLLISION RESOLUTION POLICY: COALESCED CHAINING

Page 34: Hashing Rehashed Paul M. Dorfman, Independent Consultant
Page 35: Hashing Rehashed Paul M. Dorfman, Independent Consultant
Page 36: Hashing Rehashed Paul M. Dorfman, Independent Consultant

Hashing Rehashed SUGI 27, Paper 12 Dorfman and Snell

BENCHMARKINGRun Time (Seconds)

0 10 20 30 40 50 60 70 80 90 100

Key-Indexing

Bitmapping

Coalesced Chaining-05

Coalesced Chaining-08

Double Hashing-05

Double Hashing-08

Sqxjhsh

Format

Merge

100,000

300,000

500,000

Observations

Memory (MB)

0 10 20 30 40 50 60 70

Key-Indexing

Bitmapping

Coalesced Chaining-05

Coalesced Chaining-08

Double Hashing-05

Double Hashing-08

Sqxjhsh

Format

Merge

100,000

300,000

500,000

Observations