hashing & indexing
-
Upload
anasua-bhattacharyya -
Category
Documents
-
view
1.023 -
download
2
Transcript of hashing & indexing
INDEX (CHAPTER 12)INDEX (CHAPTER 12)
9/23/20071
TOPICS
• Basic concepts
• Hashing• Hashing
• B+-tree
9/23/20072
INTRODUCTION
• Review
Conceptual E-R data model
Logical Relational data model SQL
Physical relation = a fileOrg. of records on a disk pageOrganization of attributes within a recordI d FilIndex Files
9/23/20073
Software Architecture of a DBMS
Query Parser
Query Interpretor
Query Optimizer
Relational Algebra operators: ∏, σ, ρ, δ, ←, ∪, ∩, ÷, −
Abstraction of records
Index structures
Relational Algebra operators: ∏, σ, ρ, δ, ←, ∪, ∩, ÷,
File System
Buffer Pool Manager
9/23/20074
Implementation of б
SS# N A S l d
• Emp table:
SS# Name Age Salary dno
1 Joe 24 20000 2
2 Mary 20 25000 3
3 B b 22 27000 43 Bob 22 27000 4
4 Kathy 30 30000 5
5 Shideh 4 4000 1
• бSalary=30,000(Employee)SS# Name Age Salary dno
4 Kathy 30 30000 5
• Process the select operator using a file scan (linear scan)F1 = Open the file corresponding to EmployeeF1 Open the file corresponding to EmployeeP = read first page of F1While P is not null
For each record in P, if the record satisfies the selection predicate then produce as outputP = read next page of F1 /* P becomes null when EoF is reached */
9/23/20075
P = read next page of F1 /* P becomes null when EoF is reached */
Implementation of б
SS# N A S l d
• Emp table:
SS# Name Age Salary dno
1 Joe 24 20000 2
2 Mary 20 25000 3
3 B b 22 27000 43 Bob 22 27000 4
4 Kathy 30 30000 5
5 Shideh 4 4000 1
• бSalary=30,000(Employee)SS# Name Age Salary dno
4 Kathy 30 30000 5
• Process the select operator using a file scan (linear scan)F1 = Open the file corresponding to Employee
Fetch the page from disk if not in the buffer pool
F1 Open the file corresponding to EmployeeP = read first page of F1While P is not null
For each record in P, if the record satisfies the selection predicate then produce as outputP = read next page of F1
9/23/20076
P = read next page of F1
Implementation of б
SS# N A S l d
• Emp table:
SS# Name Age Salary dno
1 Joe 24 20000 2
2 Mary 20 25000 3
3 B b 22 27000 43 Bob 22 27000 4
4 Kathy 30 30000 5
5 Shideh 4 4000 1
• бSalary=30,000(Employee)SS# Name Age Salary dno
4 Kathy 30 30000 5
• Process the select operator using a file scan (linear scan)F1 = Open the file corresponding to Employee
Header
F1 Open the file corresponding to EmployeeP = read first page of F1While P is not null
For each record in P, if the record satisfies the selection predicate then produce as outputP = read next page of F1
9/23/20077
P = read next page of F1
TERMINOLOGY
• An exact match selection predicate: бSalary=30,000(Employee) , бFirstName=“Shideh”(Employee)бFirstName= Shideh (Employee)
• A range selection predicate: б (Employee) б (Employee)• A range selection predicate: бSalary>30,000(Employee) , бSalary<30,000(Employee), бSalary>30,000 and Salary < 32,000 (Employee)
9/23/20078
INTRODUCTION (Cont…)( )
• Motivation: Speed-up those queries that reference only a small portion of the records in a file.
• Analogy: Catalog cards in the library (more than one index).
• Evaluation:1. Access time (find)2. Insertion time (find + add)3. Deletion time (find + delete)4 S h d4. Space overhead
• Search-key: The attribute (or set of attributes) used to lookup records in a file
• Primary index: The index whose search key specifies the sequential order of y y p qthe records within a file.
• Secondary index: The index whose search key does not specify the sequential order of the records within a file.
9/23/20079
INTRODUCTION (Cont…)( )
• Example:
Alaska Alaska Bob 12 AliceState Name Age Other!
AlaskaAlaskaArizonaCaliforniaCalifornia
Alaska Bob 12 ...Alaska George 28Arizona David 48California Hellen 20California Jack 37
AliceBobCharlesDavidDavid
FloridaFloridaIndianaOhio
Florida Frank 10Florida Charles 4Indiana Joe 12Ohio Alice 23
FrankGeorgeHellenJack
• Assume, size of disk page = 2 data records = 5 index records.
I d i t i d i ?
Ohio Ohio David 36 Joe
• Indexing or not indexing?SELECT age SELECT ageFROM personnel FROM personnelWHERE name = “Alice” WHERE name = “Don”
9/23/200710
WHERE name Alice WHERE name Don
INTRODUCTION (Cont…)( )
• Example:Alaska Alaska Bob 12 Alice
State Name Age Else!AlaskaAlaskaBostonCaliforniaCalifornia
Alaska Bob 12 ...Alaska George 28Boston David 48California Hellen 20California Jack 37
AliceBobCharlesDavidDavid
FloridaFloridaIndianaOhio
Florida Frank 10Florida Charles 4Indiana Joe 12Ohio Alice 23
FrankGeorgeHellenJack
• Assume, size of disk page = 2 data records = 5 index records.
Ohio Ohio David 36 Joe
• Primary vs. Secondary”SELECT name SELECT ageFROM personnel FROM personnelWHERE t t “Ohi ” WHERE “D id”
9/23/200711
WHERE state = “Ohio” WHERE name = “David”
INTRODUCTION (Cont…)( )
• Example: (page = 2 data = 5 index)
Al k Al k B b 12 AliState Name Age Else!
AlaskaAlaskaBostonCaliforniaCalifornia
Alaska Bob 12 ...Alaska George 28Boston David 48California Hellen 20California Jack 37
AliceBobCharlesDavidDavidCalifornia
FloridaFloridaIndianaOhio
California Jack 37Florida Frank 10Florida Charles 4Indiana Joe 12Ohio Alice 23
DavidFrankGeorgeHellenJack
• Exact match vs. RangeSELECT name SELECT name
Ohio Ohio David 36 Joe
FROM personnel FROM personnelWHERE state = “California” WHERE state >= “Alaska” and
state <= “Florida”
Speed p b emplo ing binar search (is it possible?)
9/23/200712
• Speedup by employing binary search (is it possible?)
Dense Index Files
• Dense index — Index record appears for every search-key value in the file.
9/23/200713
Example of Sparse Index Filesp p
9/23/200714
Multilevel Index
9/23/200715
HASHING
Hash function:• K: the set of all search key valuesK: the set of all search key values• V: the set of all bucket address• h(K): K V• K is large (perhaps infinite) but set of search-key values actually stored in theK is large (perhaps infinite) but set of search key values actually stored in the
database is much smaller than K.• Fast lookup: To find Ki, search the bucket with h(Ki) address.
9/23/200716
HASHING (Cont…)( )
• Example:– K = salary (set of all 6 digit integers)y ( g g )– V = 1000 buckets addressed from 0 to 999– h(k) = k mod 1000.SELECT nameFROM personnelWHERE salary = “120,100”
• To find a 120 100 salary we should search bucket number 100• To find a 120,100 salary, we should search bucket number 100.• Hash is only appropriate for Exact match queries.• A bad hash function maps the value to a subset of (or a few) buckets (e.g., h(k)
= k mod 10 k mod 10.
9/23/200717
HASHING (Cont…)( )
• Clustered Hash Index– The index structure and its buckets are represented as a file (say file.hash)p ( y )– The relation is stored in file.hash (I.e., each entry in file.hash corresponds to a
record in relation)– Assuming no duplicates: the record can be accessed in 1 IO.
N l d H h I d• Non-clustered Hash Index:– The index structure and its buckets are represented as a file (say file.hash)– The relation remains intact
Each entry in file hash has the following format: (search key value RID)– Each entry in file.hash has the following format: (search-key value, RID)– Assuming no duplicates: the record can be accessed in 2 IO.
9/23/200718
HEAP FILE ORGANIZATION
• Assume a student table: Student(name, age, gpa, major)t(Student) = 16P(Student) = 4( )
Bob, 21, 3.7, CS Kane, 19, 3.8, ME Louis, 32, 4, LS Chris, 22, 3.9, CSBob, 21, 3.7, CS
Mary, 24, 3, ECE
Tom, 20, 3.2, EE
Kane, 19, 3.8, ME
Lam, 22, 2.8, ME
Chang, 18, 2.5, CS
Louis, 32, 4, LS
Martha, 29, 3.8, CS
James, 24, 3.1, ME
Chris, 22, 3.9, CS
Chad, 28, 2.3, LS
Leila, 20, 3.5, LSTom, 20, 3.2, EE
Kathy, 18, 3.8, LS
Chang, 18, 2.5, CS
Vera, 17, 3.9, EE
James, 24, 3.1, ME
Pat, 19, 2.8, EE
Leila, 20, 3.5, LS
Shideh, 16, 4, CS
9/23/200719
Non-Clustered Hash Index• A non-clustered hash index on the age attribute with 4 buckets• A non-clustered hash index on the age attribute with 4 buckets, • h(age) = age % B
(21, (1, 1))
(24, (1, 2))(32, (3,1))(20 (1 3))
(20, (4,3))(16, (4,4))(24 (3 3))
( ( ))(17, (2,4))(29, (3,2))
(20, (1,3))
(18 (1 4))
(28, (4,2))012
(24, (3,3))
(18, (1, 4))(22, (2,2))(22, (4,1))
(19, (2, 1))
23
(19, (3, 4))(18 (2 3))
B b 21 3 7 CS K 19 3 8 ME L i 32 4 LS Ch i 22 3 9 CS
(18, (2,3))
Bob, 21, 3.7, CS
Mary, 24, 3, ECE
Tom 20 3 2 EE
Kane, 19, 3.8, ME
Lam, 22, 2.8, ME
Chang 18 2 5 CS
Louis, 32, 4, LS
Martha, 29, 3.8, CS
James 24 3 1 ME
Chris, 22, 3.9, CS
Chad, 28, 2.3, LS
Leila 20 3 5 LS
9/23/200720
Tom, 20, 3.2, EE
Kathy, 18, 3.8, LS
Chang, 18, 2.5, CS
Vera, 17, 3.9, EE
James, 24, 3.1, ME
Pat, 19, 2.8, EE
Leila, 20, 3.5, LS
Shideh, 16, 4, CS
Clustered Hash Index• A clustered hash index on the age attribute with 4 buckets• A clustered hash index on the age attribute with 4 buckets, • h(age) = age % B
Bob, 21, 3.7, CS
Mary, 24, 3, ECE
T 20 3 2 EELouis, 32, 4, LS
J 24 3 1 MELeila, 20, 3.5, LSShideh, 16, 4, CS
Tom, 20, 3.2, EE
K h 18 3 8 LS
Vera, 17, 3.9, EEMartha, 29, 3.8, CS
James, 24, 3.1, ME
Chad, 28, 2.3, LS012 Kathy, 18, 3.8, LS
Kane, 19, 3.8, MELam, 22, 2.8, ME
Ch 18 2 CSPat, 19, 2.8, EE
Chris, 22, 3.9, CS
23
Chang, 18, 2.5, CS
9/23/200721
Non-Clustered Hash Index• A non-clustered hash index on the age attribute with 4 buckets 500• A non-clustered hash index on the age attribute with 4 buckets, • h(age) = age % B• Pointers are page-ids
(21, (1, 1))
(24, (1, 2))(32, (3,1))(20 (1 3))
(20, (4,3))(16, (4,4))(24 (3 3))
5001001
( ( ))(17, (2,4))(29, (3,2))
(20, (1,3))
(18 (1 4))
(28, (4,2))012
(24, (3,3))
7065001001706 (18, (1, 4))
(22, (2,2))(22, (4,1))
(19, (2, 1))
23
(19, (3, 4))(18 (2 3))
101706101
B b 21 3 7 CS K 19 3 8 ME L i 32 4 LS Ch i 22 3 9 CS
(18, (2,3))
Bob, 21, 3.7, CS
Mary, 24, 3, ECE
Tom 20 3 2 EE
Kane, 19, 3.8, ME
Lam, 22, 2.8, ME
Chang 18 2 5 CS
Louis, 32, 4, LS
Martha, 29, 3.8, CS
James 24 3 1 ME
Chris, 22, 3.9, CS
Chad, 28, 2.3, LS
Leila 20 3 5 LS
9/23/200722
Tom, 20, 3.2, EE
Kathy, 18, 3.8, LS
Chang, 18, 2.5, CS
Vera, 17, 3.9, EE
James, 24, 3.1, ME
Pat, 19, 2.8, EE
Leila, 20, 3.5, LS
Shideh, 16, 4, CS
Clustered Hash Index (SEQUENTIAL LAYOUT)• A clustered hash index on the age attribute with 4 buckets• A clustered hash index on the age attribute with 4 buckets, • h(age) = age % 4• When the number of buckets are known in advance, the system may
assume a sequentially laid file to eliminate the need for the hash directory.assume a sequentially laid file to eliminate the need for the hash directory.
Leila, 20, 3.5, LSShideh, 16, 4, CS
James, 24, 3.1, ME, , ,
M 24 3 ECE Bob, 21, 3.7, CSMary, 24, 3, ECE
Tom, 20, 3.2, EE
Kathy, 18, 3.8, LS Kane, 19, 3.8, MELam, 22, 2.8, MEVera, 17, 3.9, EELouis, 32, 4, LS
Martha, 29, 3.8, CS
Pat, 19, 2.8, EEChris, 22, 3.9, CS
Ch d 28 2 3 LS
9/23/200723
Chang, 18, 2.5, CSChad, 28, 2.3, LS
Clustered Hash Index (SEQUENTIAL LAYOUT)• A clustered hash index on the age attribute with 4 buckets• A clustered hash index on the age attribute with 4 buckets, • h(age) = age % 4• When the number of buckets are known in advance, the system may
assume a sequentially laid file to eliminate the need for the hash directory.assume a sequentially laid file to eliminate the need for the hash directory.
Leila, 20, 3.5, LSShideh, 16, 4, CS
Offset (bucket-id –1) times page size is for bucket id
James, 24, 3.1, ME, , ,
Offset 0 is for bucket 0
bucket-id
M 24 3 ECE
Offset Page Size is for bucket 1
Bob, 21, 3.7, CSMary, 24, 3, ECE
Tom, 20, 3.2, EE
Kathy, 18, 3.8, LS Kane, 19, 3.8, MELam, 22, 2.8, MEVera, 17, 3.9, EELouis, 32, 4, LS
Martha, 29, 3.8, CS
Pat, 19, 2.8, EEChris, 22, 3.9, CS
Ch d 28 2 3 LS
9/23/200724
Chang, 18, 2.5, CSChad, 28, 2.3, LS
Bucket Block address
0Number on disk
12
M-2M-1
9/23/200725
Example of Non-Clustered Hash Index
9/23/200726
Main buckets Overflow buckets
340460
Main buckets
981 Record pointer
Record pointer
Overflow buckets
01 460
Record pointer181 Record pointer
Record pointer12
32176191
551 Record pointer
Record pointer91Record pointer
22
Record pointer
Record pointer
72522
Record pointer
9
9/23/200727
p
Bucket Block address
0Number on disk
12
M-2M-1
9/23/200728
Example of Hash Index
9/23/200729
Main buckets Overflow buckets
340460
Main buckets
981 Record pointer
Record pointer
Overflow buckets
01 460
Record pointer182 Record pointer
Record pointer12
32176191
552 Record pointer
Record pointer91Record pointer
22
Record pointer
Record pointer
72522
Record pointer
9
9/23/200730
p
HASHING (Cont…)( )
• Loading factor– B = # of buckets, S = # of records per bucket, R = # of records in the relation, p ,– loading - factor = R / (B×S)– The loading factor should not exceed 80%, if that happens, double B and re-hash.
• Why a bucket might overflow?– Heavy loading of the file– Poor hash functions– Statistical peculiarities
If b k t fl ?• If a bucket overflows?– Chaining: chain an empty bucket to the bucket that overflows.– Open addressing: If bucket h(k) is full, store the record in h(k) + 1, if that is also
full, try h(k) + 2, and so on., y ( ) ,– Two hash functions: If bucket h(k) is full, store the record in h’(k).
9/23/200731
HASHING (Cont…)( )
• Problem: The file grows and shrinks over time. Hence, how one should choose the hash function:1. Based on current file size performance degradation as DB grows2. Based on anticipated file size waste space initially (and reduced buffer hits)3. Periodical reorganization time consuming
3.1. Choose new hash function3.2. Recompute hash value on every record3.3. Generate new bucket assignments
S l ti• Solution:– Dynamic hash functions: dynamic modification of h to accommodate growth and
shrinkage of the DB. (e.g., extendible hashing)
9/23/200732
HASHING (Cont…)( )
Extendible hashing• Choose a hash function (h) such that it results in a b (b = 32) bit binaryChoose a hash function (h) such that it results in a b (b 32) bit binary
number.• The directory has a header that contains its depth, d.• Each directory entry points to a hash bucket.y y p• Buckets are created on demand, as records are inserted.• Each bucket contains a local depth used to find data.
Directory depth
200
directory
Directory depth
1 bucket
01
10
11siblings
9/23/200733
HASHING (Cont…)( )
Extendible hashing (continued):• Every time a bucket overflows, its local depth is increased. If the local depth isEvery time a bucket overflows, its local depth is increased. If the local depth is
greater than the depth of the directory, the directory’s depth is increased, causing the directory to double in size.
• Each directory entry has one sibling or buddy. Two entries are buddies if they have identical bit patterns except for the dth bit.
• Every time a bucket overflows, its local depth is increased.• If the local depth is greater than the depth of the directory, then the directory’s
d th i i d i th di t t d bl i idepth is increased, causing the directory to double in size.• A bucket can overflow at any desired loading factor. That is, a split might
happen every time a bucket is 80% full.
9/23/200734
HASHING (Cont…)( )
• Retrieval with Extendible hashing:
Retrieve (K )Retrieve (K0)1. Calculate h’ = h(K0)2. Read depth d of the directory3. Interpret the d initial bits of h’ as an integer base 2, term this r.p g ,4. Retrieve the bucket pointed to by the rth entry5. Find the record in this bucket
5.1. If a hashing technique is used to organize the records in a bucket, use the d bits d fi d h b kdefined on that bucket5.2. If necessary, follow the collision resolution scheme within this bucket.
9/23/200735
HASHING (Cont…)( )
• Insertion with Extendible hashing:
Insert (K )Insert (K0)1. Apply the first four steps of Retrieve (K0) to find bucket b.2. If the insertion of K0 into b result in no overflow then Insert K0 into b and return3. Otherwise, obtain a new bucket b’,4. Set the local depth of b’ and b to equal (local depth of b + 1)5. If the new depth is NOT greater than the depth of the directory
5.1. Distinguish between b and b’ using their new d and set the appropriate (i ) f h di i hentry(ies) of the directory to point to each
5.2. Rehash the entries in bucket b and assign each individual entry to the appropriate bucket b or b’5.3. Insert (K0)( 0)
6. If the new depth is greater than the depth of the directory6.1 Increase the depth of the directory, doubling its size6.2. Set each entry and its buddy to point to the old bucket that it was pointing to
9/23/200736
6.3. Insert (K0)
HASHING (Cont…)( )
• Deletion with Extendible hashing:• Delete (K0)Delete (K0)
1. Apply the first four steps of Retrieve (K0) to find bucket b.2. If K0 is not b then return with value no found3. Otherwise, delete the entry corresponding to K0
4. If the sum of the number of entries on this page and its sibling page are below the size of a bucket then:4.1. Copy the entries in the two buckets into one bucket b’4 2 Depth of b’ = (depth of b 1)4.2. Depth of b = (depth of b - 1)4.3. Free bucket b and its sibling4.4. Locate the two hash directory entries pointing to b and its buddy. Set these two
pointers to b’4.5. If every pointer in the directory equals its sibling pointer then decrease the
depth of the directory by one and set each entry in an obvious manner.
9/23/200737
Use of Extendable Hash Structure: Example
9/23/200738Initial Hash structure, bucket size = 2
Example (Cont.)p ( )
• Hash structure after insertion of one Brighton and two Downtown records
9/23/200739
Example (Cont.)p ( )Hash structure after insertion of Mianus record
9/23/200740
Example (Cont.)
H h t t ft i ti f th P id d
9/23/200741
Hash structure after insertion of three Perryridge records
Example (Cont.)p ( )
• Hash structure after insertion of Redwood and Round Hill records
9/23/200742
HASHING (Cont…)( )
• Extendible hashing:The insertion algorithm of extendible hashing might crash whenThe insertion algorithm of extendible hashing might crash when
9/23/200743
HASHING (Cont…)( )
Hashing vs. Indexing• Hashing is appropriate for exact match queries: (cannot support range queries)Hashing is appropriate for exact match queries: (cannot support range queries)
SELECT A1, A2, …FROM rWHERE (Ai = c)WHERE (Ai c)
• Indexing is appropriate for both range and exact match queries:SELECT A1, A2, …FROM rFROM rWHERE (Ai <= c1) and (Ai > c2)
9/23/200744
Examplep
• Suppose that we are using extendable hashing on a file that contains records with the following search key values:g y
2, 3, 5, 7, 11, 17, 19, 23, 29, 31
Show the extendable hash structure for this file if hash function is
h(x) = x mod 8 and buckets can hold three records( )
9/23/200745
B+-TREE
• B+-tree is a multi-level tree structured directory
….
Root
Internal Nodes
... ... Leaf Nodes
• Clustered: Leaf nodes contain the records themselves
Data File
Clustered: Leaf nodes contain the records, themselves.
9/23/200746
B+-TREE (Cont…)( )
• Non-clustered: Leaf nodes contain the pairs (P, K), where P is a pointer to the record in the file and K is a search-key.y
9/23/200747
B+-TREE (Cont…)( )
• Leaf nodes
P K P P K P
– Maintain between to n-1 values per leaf.– If i < j then Ki < Kj
P1 K1 P2 . . . Pn-1 Kn-1 Pn
(n-1)2
i j
5 7 10 (n = 4)
– Every search-key value in the file appears in some leaf node.– Suppose Li and Lj are two leaves and i < j, then every search value in Li is less than
every search value in Lj.
5 7 10 15 17 18
9/23/200748
B+-TREE (Cont…)( )
• Internal nodes– Maintain between to n pointers per internal node
n2 p p
– root is an exception: It must have more than one pointer.– Suppose a node with m pointers and 2<= i < m:
1. Pi points to subtree containing search-key values < Ki and >= Ki-1.2. Pm points to subtree containing search-key values >= Km-1.3. P1 points to subtree containing search-key values < K1.
5 7 105 7 10
2 3 5 6 10 10 11
9/23/200749
To calculate the order n of a B+-treeTo calculate the order n of a B tree
• Suppose that the search key field is V = 9 bytes long, the block size is B=512 bytes a record pointer is P = 7 bytesblock size is B=512 bytes, a record pointer is Pr = 7 bytes, and a block pointer is P = 6 bytes. 1. Calculate order of the internal nodes2. Calculate order of the leaf nodes
9/23/200750
To calculate the order n of a B+-treeTo calculate the order n of a B tree
• Suppose that the search key field is V = 9 bytes long, the block size is B=512 bytes a record pointer is P = 7 bytesblock size is B=512 bytes, a record pointer is Pr = 7 bytes, and a block pointer is P = 6 bytes. 1. Calculate order of the internal nodes
• An internal node Of a B+-tree can have up to n tree i d h k lpointers and n-1 search key values
• (n * P) + ((n-1) * V) <= B• (n * 6) +((n 1) * 9) <= 512• (n * 6) +((n-1) * 9) <= 512• (15 * n ) <= 521• n = 34
9/23/200751
n 34
To calculate the order n of a B+-treeTo calculate the order n of a B tree
• Suppose that the search key field is V = 9 bytes long, the block size is B=512 bytes a record pointer is P = 7 bytesblock size is B 512 bytes, a record pointer is Pr 7 bytes, and a block pointer is P = 6 bytes.
C l l d f h l f dCalculate order of the leaf nodes• The leaf nodes of the B+-tree will have the same
number of values and pointers, except that the pointers p , p pare data pointers and a next pointer.
• (nleaf * (Pr + V)) + P <= B• (n * (7 + 9)) + 6 <= 512• (nleaf * (7 + 9)) + 6 <= 512• (16 * nleaf ) <= 506• nleaf = 31
9/23/200752
leaf
B+-TREE (Cont…)( )
• Lookup 30
8 41 50
4 7 10 20 30 40 41 47 50 52
471020
30404147
5052
– Find 7: 4 Ios– Find 4-20: 4 IOs (assuming primary index), 8 IOs (assuming secondary index)– More than 10% selection: it is more efficient to do sequential scan (do not use the
d i d )secondary index).– Example: 10,000 records, select 1000 of them, 1000 records per disk page:
(Sequential search: 10 IOs, Secondary index: potentially 1000+ IOs)
9/23/200753
B+-TREE (Cont…)( )
• Analysis– “B” in B+-tree stands for Balanced. i.e., the length of every path from the root to a , g y p
leaf node is the same.– Hence, good performance for lookup, insertion, and deletion– K: number of search key values in a file, then the path is < log (K).
#K 1 000 000 d 10 100 th t t 3 t 9 d b dn2
– #K = 1,000,000, and 10 <= n <= 100 then at most 3 to 9 nodes be accessed.– Insertion and Deletion should not destroy the balance of the tree.
9/23/200754
B+-TREE (Cont…)( )
8 25
n = 4;Internal nodes: 2 to 4 pointersLeaf nodes: 2 to 3 values
10 204 7 30 40
8 25
10 204 7 30 40 41
Insert 41
10 204 7 30 40 41
30 40 41 47
Insert 47
8 25 41
30 40 41 47
8 25
10 204 7 30 40 41 47
41
9/23/200755
B+-TREE (Cont…)( )
Insert 508 25
10 204 7 30 40 41 47
41
50
Insert 5241 47 50 52
8 25 41 50
41
258 50
10 204 7 30 40 41 47 50 52
9/23/200756
B+-TREE (Cont…)( )30
8 41 50
D l 20
10 204 7 30 40 41 47 50 52
Delete 20 30
8 41 50
4 7 30 40 41 47 50 5210
30 41
104 7
50
41 47 5030 40 52
9/23/200757
ExampleExample
• Construct a B+- tree for the following set of lvalues:
2, 3, 5, 7, 11, 17, 19, 23, 29, 31
• Assume n = 4 (number of pointers)– Inner nodes : 4 to 2 children– Leaf nodes : 3 to 2 values
9/23/200758