File Structure Chapter 11. Hashing. - 2 - File Structures - Chapter 11 - Contents Introduction A...

24
File Structure File Structure Chapter 11. Hashing Chapter 11. Hashing

Transcript of File Structure Chapter 11. Hashing. - 2 - File Structures - Chapter 11 - Contents Introduction A...

Page 1: File Structure Chapter 11. Hashing. - 2 - File Structures - Chapter 11 - Contents  Introduction  A Simple Hashing Algorithm  Hashing Functions and.

File StructureFile Structure

Chapter 11. HashingChapter 11. Hashing

Page 2: File Structure Chapter 11. Hashing. - 2 - File Structures - Chapter 11 - Contents  Introduction  A Simple Hashing Algorithm  Hashing Functions and.

- 2 -File Structures - Chapter 11 -File Structures - Chapter 11 -

Contents

Introduction

A Simple Hashing Algorithm

Hashing Functions and Record Distributions

How Much Extra Memory Should Be Used?

Collision Resolution by Progressive Overflow

Storing More Than One Record per Address: Buckets

Making Deletions

Other Collision Resolution Techniques

Patterns of Record Access

Introduction

A Simple Hashing Algorithm

Hashing Functions and Record Distributions

How Much Extra Memory Should Be Used?

Collision Resolution by Progressive Overflow

Storing More Than One Record per Address: Buckets

Making Deletions

Other Collision Resolution Techniques

Patterns of Record Access

Page 3: File Structure Chapter 11. Hashing. - 2 - File Structures - Chapter 11 - Contents  Introduction  A Simple Hashing Algorithm  Hashing Functions and.

- 3 -File Structures - Chapter 11 -File Structures - Chapter 11 -

1. Introduction

O-notationO(1)O(N) : sequential searchingO(log2N)

O(logkN) : B-Tree (k : 리프 노드 크기 )

What is Hashing?a = h(K)

h (hash function), K (key), a (home address)

ExampleK = BASSh = (first char * second char) mod 1000

a = h(K) = (66 * 65) mod 1000 = 4,290 mod 1000 = 290

O-notationO(1)O(N) : sequential searchingO(log2N)

O(logkN) : B-Tree (k : 리프 노드 크기 )

What is Hashing?a = h(K)

h (hash function), K (key), a (home address)

ExampleK = BASSh = (first char * second char) mod 1000

a = h(K) = (66 * 65) mod 1000 = 4,290 mod 1000 = 290

Page 4: File Structure Chapter 11. Hashing. - 2 - File Structures - Chapter 11 - Contents  Introduction  A Simple Hashing Algorithm  Hashing Functions and.

- 4 -File Structures - Chapter 11 -File Structures - Chapter 11 -

Introduction

CollisionExample

key : LOWELL => a = (76 * 79) mod 1000 = 6,004 mod 1000 = 4 OLIVIER => a = (79 * 76) mod 1000 = 6,004 mod 1000 = 4

Several ways to reduce the number of collisions 1. Spread out the records

Good hashing algorithms 2. Use extra memory 3. Put more than one record at a single address

Buckets

CollisionExample

key : LOWELL => a = (76 * 79) mod 1000 = 6,004 mod 1000 = 4 OLIVIER => a = (79 * 76) mod 1000 = 6,004 mod 1000 = 4

Several ways to reduce the number of collisions 1. Spread out the records

Good hashing algorithms 2. Use extra memory 3. Put more than one record at a single address

Buckets

Page 5: File Structure Chapter 11. Hashing. - 2 - File Structures - Chapter 11 - Contents  Introduction  A Simple Hashing Algorithm  Hashing Functions and.

- 5 -File Structures - Chapter 11 -File Structures - Chapter 11 -

2. A Simple Hashing Algorithm

3 Steps1. Represent the key in numerical form2. Fold and add3. Divide by a prime number and use the remainder as the address

ExampleStep 1. Represent the Key in Numerical Form

3 Steps1. Represent the key in numerical form2. Fold and add3. Divide by a prime number and use the remainder as the address

ExampleStep 1. Represent the Key in Numerical Form

LOWELL = 76 79 87 69 76 76 32 32 32 32 32 32 L O W E L L Blanks

Page 6: File Structure Chapter 11. Hashing. - 2 - File Structures - Chapter 11 - Contents  Introduction  A Simple Hashing Algorithm  Hashing Functions and.

- 6 -File Structures - Chapter 11 -File Structures - Chapter 11 -

A Simple Hashing Algorithm

Example (계속 )Step 2. Fold and Add

76 79 | 87 69 | 76 76 | 32 32 | 32 32 | 32 327679 + 8769 + 7676 + 3232 + 3232 = 30588(30588+3232 = 33820 => 2byte Maximum 값 32767 을 초과하므로 )

7679 + 8769 = 16448 => 16448 mod 19937 = 16448 16448 + 7676 = 24124 => 24124 mod 19937 = 4187

4187 + 3232 = 7419 => 7419 mod 19937 = 74197419 + 3232 = 10651 => 10651 mod 19937 = 1065110651 + 3232 = 13883 => 13883 mod 19937 = 13883

Step 3. Divide by the Size of the Address Spacea = s mod n (n : # of address in file)a = 13883 mod 100 = 83a = 13883 mod 101 = 46

Example (계속 )Step 2. Fold and Add

76 79 | 87 69 | 76 76 | 32 32 | 32 32 | 32 327679 + 8769 + 7676 + 3232 + 3232 = 30588(30588+3232 = 33820 => 2byte Maximum 값 32767 을 초과하므로 )

7679 + 8769 = 16448 => 16448 mod 19937 = 16448 16448 + 7676 = 24124 => 24124 mod 19937 = 4187

4187 + 3232 = 7419 => 7419 mod 19937 = 74197419 + 3232 = 10651 => 10651 mod 19937 = 1065110651 + 3232 = 13883 => 13883 mod 19937 = 13883

Step 3. Divide by the Size of the Address Spacea = s mod n (n : # of address in file)a = 13883 mod 100 = 83a = 13883 mod 101 = 46

Page 7: File Structure Chapter 11. Hashing. - 2 - File Structures - Chapter 11 - Contents  Introduction  A Simple Hashing Algorithm  Hashing Functions and.

- 7 -File Structures - Chapter 11 -File Structures - Chapter 11 -

3. Hashing Functions and Record Distributions

Distributing Records among Addresses Distributing Records among Addresses

12345678910

ABCDEFG

Record Address

Best

(a)

12345678910

ABCDEFG

Record Address

Worst

(b)

12345678910

ABCDEFG

Record Address

Acceptable

(c)

<Figure 11.3> Different distributions. (a) Uniform distribution(Best) (b) Worst case (c) Randomly distribution (Acceptable)

Page 8: File Structure Chapter 11. Hashing. - 2 - File Structures - Chapter 11 - Contents  Introduction  A Simple Hashing Algorithm  Hashing Functions and.

- 8 -File Structures - Chapter 11 -File Structures - Chapter 11 -

Hashing Functions and Record Distributions

Some Other Hashing MethodsBetter than random

Examine keys for a pattern 주민등록 번호

Divide the key by a prime number

Random Square the key and take the middle

4532 => 2 0 5 2 0 9 Radix transformation

Some Other Hashing MethodsBetter than random

Examine keys for a pattern 주민등록 번호

Divide the key by a prime number

Random Square the key and take the middle

4532 => 2 0 5 2 0 9 Radix transformation

Page 9: File Structure Chapter 11. Hashing. - 2 - File Structures - Chapter 11 - Contents  Introduction  A Simple Hashing Algorithm  Hashing Functions and.

- 9 -File Structures - Chapter 11 -File Structures - Chapter 11 -

4. How Much Extra Memory Should Be Used ?

Packing Density

Exampler = 75 recordsN = 100 address

Packing Density

Exampler = 75 recordsN = 100 address

N

r

spaces of #

records of #

%7575.0100

75

Page 10: File Structure Chapter 11. Hashing. - 2 - File Structures - Chapter 11 - Contents  Introduction  A Simple Hashing Algorithm  Hashing Functions and.

- 10 -File Structures - Chapter 11 -File Structures - Chapter 11 -

How Much Extra Memory Should Be Used ?

Predicting Collisions for Different Packing Densities Predicting Collisions for Different Packing Densities

Packing density (%) Synonyms (%)

10407090100

4.817.628.134.136.8

<Table 11.2> Effect of packing density on the proportion of records not stored at their home addresses

Page 11: File Structure Chapter 11. Hashing. - 2 - File Structures - Chapter 11 - Contents  Introduction  A Simple Hashing Algorithm  Hashing Functions and.

- 11 -File Structures - Chapter 11 -File Structures - Chapter 11 -

5. Collision Resolution by Progressive Overflow

Progressive OverflowOpen addressingLinear probing

Progressive OverflowOpen addressingLinear probing

0

1

Rosen2

Jasper3

York4

Novak’s home address

York’s home address

York h(K)address

3

Novak h(K)address

2

Page 12: File Structure Chapter 11. Hashing. - 2 - File Structures - Chapter 11 - Contents  Introduction  A Simple Hashing Algorithm  Hashing Functions and.

- 12 -File Structures - Chapter 11 -File Structures - Chapter 11 -

Collision Resolution by Progressive Overflow

Search Length Search Length

KeyHome

Address# of Access

(Search Length)

AdamsBatesColeDeanEvans

01120

11225

Adams0

Bates1

Cole2

Dean3

Evans4

5

Page 13: File Structure Chapter 11. Hashing. - 2 - File Structures - Chapter 11 - Contents  Introduction  A Simple Hashing Algorithm  Hashing Functions and.

- 13 -File Structures - Chapter 11 -File Structures - Chapter 11 -

Collision Resolution by Progressive Overflow

Search Length (계속 )

Example

Search Length (계속 )

Examplerecords ofnumber total

lengthsearch total Length Search Average

2.25

52211 Length Search Average

<Figure 11.7>Average search lengthversus packing densityin a hashed file

Page 14: File Structure Chapter 11. Hashing. - 2 - File Structures - Chapter 11 - Contents  Introduction  A Simple Hashing Algorithm  Hashing Functions and.

- 14 -File Structures - Chapter 11 -File Structures - Chapter 11 -

6. Storing More Than One Record per Address : Buckets

Buckets Buckets

Key Home Address

GreenHall

JenksKingLandMarxNutt

0023333

Green Hall0

1

Jenks2

King Land Marks3

Nutt4

Page 15: File Structure Chapter 11. Hashing. - 2 - File Structures - Chapter 11 - Contents  Introduction  A Simple Hashing Algorithm  Hashing Functions and.

- 15 -File Structures - Chapter 11 -File Structures - Chapter 11 -

Storing More Than One Record per Address : Buckets

Effects of Buckets on Performance Effects of Buckets on Performance

bN

r density packing

r : # of recordsN : # of addressesb : # of records in a bucket

File without buckets File with buckets

# of records# of addresses

Bucket sizePacking density

Ratio of records to addresses

r = 750N = 1000

b = 10.75

r/N = 0.75

r = 750N = 500

b = 20.75

r/N = 1.5

Page 16: File Structure Chapter 11. Hashing. - 2 - File Structures - Chapter 11 - Contents  Introduction  A Simple Hashing Algorithm  Hashing Functions and.

- 16 -File Structures - Chapter 11 -File Structures - Chapter 11 -

Storing More Than One Record per Address : Buckets

<Table 11.4> Synonyms causing collisions as a percent of records for different packing densities and different bucket sizes

<Table 11.4> Synonyms causing collisions as a percent of records for different packing densities and different bucket sizes

Packingdensity

Bucket size

1 2 5 10

20 %

50 %

80 %

100 %

9.4

21.3

31.2

36.8

2.2

10.4

20.4

27.1

0.1

2.5

10.3

17.6

0.0

0.4

5.3

12.5

Page 17: File Structure Chapter 11. Hashing. - 2 - File Structures - Chapter 11 - Contents  Introduction  A Simple Hashing Algorithm  Hashing Functions and.

- 17 -File Structures - Chapter 11 -File Structures - Chapter 11 -

7. Making Deletions

처음상태 처음상태

KeyHome

AddressActual

address

Adams

Jones

Morris

Smith

0

1

1

0

0

1

2

3

Adams0

Jones1

Morris2

Smith3

Page 18: File Structure Chapter 11. Hashing. - 2 - File Structures - Chapter 11 - Contents  Introduction  A Simple Hashing Algorithm  Hashing Functions and.

- 18 -File Structures - Chapter 11 -File Structures - Chapter 11 -

Making Deletions

(1) Tombstones for Handling Deletions (1) Tombstones for Handling Deletions

Adams0

Jones1

Morris2

Smith3

* Deletion of Morris

Adams0

Jones1

###2

Smith3

“Smith 는 찾을 수 없다”

### : tombstoneThis mark indicates that a record once lived there but no longer does

Page 19: File Structure Chapter 11. Hashing. - 2 - File Structures - Chapter 11 - Contents  Introduction  A Simple Hashing Algorithm  Hashing Functions and.

- 19 -File Structures - Chapter 11 -File Structures - Chapter 11 -

Making Deletions

(2) Implications of Tombstones for Insertions Inserting “Smith”

(3) Effects of Deletions and Additions on PerformanceSolution to problem of deteriorating average search length

Reorganization

(2) Implications of Tombstones for Insertions Inserting “Smith”

(3) Effects of Deletions and Additions on PerformanceSolution to problem of deteriorating average search length

Reorganization

Page 20: File Structure Chapter 11. Hashing. - 2 - File Structures - Chapter 11 - Contents  Introduction  A Simple Hashing Algorithm  Hashing Functions and.

- 20 -File Structures - Chapter 11 -File Structures - Chapter 11 -

8. Other Collision Resolution Techniques

(1) Double HashingSecond hashing function

Increment(c) adding

Seek time overhead

(1) Double HashingSecond hashing function

Increment(c) adding

Seek time overhead

Page 21: File Structure Chapter 11. Hashing. - 2 - File Structures - Chapter 11 - Contents  Introduction  A Simple Hashing Algorithm  Hashing Functions and.

- 21 -File Structures - Chapter 11 -File Structures - Chapter 11 -

Other Collision Resolution Techniques

(2) Chained Progressive Overflow (2) Chained Progressive Overflow

KeyHome

addressActual

AddressSearch

length(1)Search

length(2)

AdamsBatesColeDeanEvansFlint

010140

012345

113316

112213

Adams0

Bates1

Cole2

Dean3

Evans4

Flint5

Adams0

Bates1

Cole2

Dean3

Evans4

Flint5

2

3

5

-1

-1

-1

Page 22: File Structure Chapter 11. Hashing. - 2 - File Structures - Chapter 11 - Contents  Introduction  A Simple Hashing Algorithm  Hashing Functions and.

- 22 -File Structures - Chapter 11 -File Structures - Chapter 11 -

Other Collision Resolution Techniques

(3) Chaining with a Separate Overflow Area (3) Chaining with a Separate Overflow Area

Adams0

Bates1

2

3

Evans4

0

1

-1

Cole

Dean

Flint

2

-1

-1

Homeaddress

Primarydata area

Overflowarea

Page 23: File Structure Chapter 11. Hashing. - 2 - File Structures - Chapter 11 - Contents  Introduction  A Simple Hashing Algorithm  Hashing Functions and.

- 23 -File Structures - Chapter 11 -File Structures - Chapter 11 -

Other Collision Resolution Techniques

(4) Scatter Tables: Indexing Revisited (4) Scatter Tables: Indexing Revisited

0

1

2

3

4

Adams

Coles

Deans

1

3

Bates 4

Flint -1

-1

-1Evans

Page 24: File Structure Chapter 11. Hashing. - 2 - File Structures - Chapter 11 - Contents  Introduction  A Simple Hashing Algorithm  Hashing Functions and.

- 24 -File Structures - Chapter 11 -File Structures - Chapter 11 -

Patterns of Record Access

A small percentage of the records in a file account for a large percentage of the accesses : 80 / 20 Rule80% of the accesses are performed on 20% of the records

A small percentage of the records in a file account for a large percentage of the accesses : 80 / 20 Rule80% of the accesses are performed on 20% of the records