Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need...

35
Advance Database Marwan Al-Namari Hassan Al-Mathami

Transcript of Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need...

Slide 1

Advance DatabaseMarwan Al-NamariHassan Al-MathamiIndexingWhat is Indexing?Indexing is a mechanisms .Why we need to use Indexing?We used indexing to speed up access to desired data in a database.E.g., author catalog in libraryPopular indices :balanced trees,B+ treesandhashes. Indexing (ISAM&B-Tree) Basic ConceptsIndexed Sequential Access Method(ISAM)Ordered Indices Multilevel IndexB+-Tree Index FilesB-Tree Index FilesBasic ConceptsSearch Key - attribute to set of attributes used to look up records in a file.Pointer - An index file consists of records (called index entries) of the form

Index files are typically much smaller than the original file Two basic kinds of indices:Ordered indices: search keys are stored in sorted orderHash indices: search keys are distributed uniformly across buckets using a hash function. search-keypointerIndexed Sequential Access MethodDataPage 1DataPage 2DataPage 3::DataPage N-1DataPage N If our large database is sorted, we can speed up search by doing binary search on the entire database.

However this means we must do log(N) disk accesses

The idea of ISAM is to do a faster, approximate binary search in main memory, and use this information to do fewer disk accesses (usually only one).Indexed Sequential Access MethodK P An index entry is a pair, where key is the value of the first key on the page, and pointer, points to the page.

Example

Data Page 7Maggieq4Manjula p3Marged5Montyf4Maggie page 7Indexed Sequential Access Method An index file is a concatenation of index entries. Together with one extra pointer at the beginning.

Example

K1 P1K2 P2K3 P3K3 P0Maggie page 2Waylon page 3Maggie page 1Indexed Sequential Access Method7 P4K3 P3Every record pointed to by this pointer has a value greater that or equal to 7Every record pointed to by this pointer has a value less that 7Lets look at an index file (this one is the smallest possible example)Indexed Sequential Access MethodDataPage 1DataPage 2DataPage 3::DataPage N-1DataPage N1251619::ppppppIndex FileData Files Instead of doing binary search on the data files, we can do binary search on the index, to find the largest value, which is equal to or less than the search key. We then use the pointer to go to disk to retrieve the relevant block from disk. Example: we are searching for 8, we do a binary search to find 5, we retrieve page 2, and search it to find a match (if there is one). Find the largest entry less than the key, follow right child. Find the smallest entry greater than or equal to the key, follow left child. Indexed Sequential Access MethodDataPage 1DataPage 2DataPage 3::DataPage N-1DataPage N523477::ppppppIndex FileData Files How big should the index file be? How about more pointers per page? We could have two pointers to each page (on average, or exactly). This does not help, because we have to retrieve a block at a time. Indexed Sequential Access Method How big should the index file be? How about less pointers per page? We could have a pointer for each two pages (on average, or exactly). This might help, because it makes the index smaller. We can do a little trick of adding sideways pointers. Actually, these sideways pointer can be useful for another reason, they can be helpful for range queries. However, to few pointers leads to chaining.DataPage 1DataPage 2DataPage 3::DataPage N-1DataPage N1251619::ppppppIndex FileData FilesIndexed Sequential Access MethodDataPage 1DataPage 2DataPage 3::DataPage N-1DataPage N1251619::ppppppIndex FileData Files We have seen that too small or too large an index (in other words too few or too many pointers) can be a problem. But suppose the index does not fit in main memory? The key observation is that the index itself is a sort of database, so lets build an index on the index!21pOrdered IndicesIn an ordered index, index entries are stored sorted on the search key value. E.g., author catalog in library.Primary index: in a sequentially ordered file, the index whose search key specifies the sequential order of the file.Also called clustering indexThe search key of a primary index is usually but not necessarily the primary key.Secondary index: an index whose search key specifies an order different from the sequential order of the file. Also called non-clustering index.Index-sequential file: ordered sequential file with a primary index.Indexing techniques evaluated on basis of:Multilevel IndexIf primary index does not fit in memory, access becomes expensive.To reduce number of disk accesses to index records, treat primary index kept on disk as a sequential file and construct a sparse index on it.outer index a sparse index of primary indexinner index the primary index fileIf even outer index is too large to fit in main memory, yet another level of index can be created, and so on.Indices at all levels must be updated on insertion or deletion from the file.Multilevel Index (Cont.)

B+-Tree Index FilesDisadvantage of indexed-sequential files: performance degrades as file grows, since many overflow blocks get created. Periodic reorganization of entire file is required.Advantage of B+-tree index files: automatically reorganizes itself with small, local, changes, in the face of insertions and deletions. Reorganization of entire file is not required to maintain performance.Disadvantage of B+-trees: extra insertion and deletion overhead, space overhead.Advantages of B+-trees outweigh disadvantages, and they are used extensively.B+-tree indices are an alternative to indexed-sequential files.B+-Tree Index Files (Cont.)All paths from root to leaf are of the same lengthEach node that is not a root or a leaf has between [n/2] and n children.A leaf node has between [(n1)/2] and n1 valuesSpecial cases: If the root is not a leaf, it has at least 2 children.If the root is a leaf (that is, there are no other nodes in the tree), it can have between 0 and (n1) values.A B+-tree is a rooted tree satisfying the following properties:B+-Tree Node StructureTypical node

Ki are the search-key values Pi are pointers to children (for non-leaf nodes) or pointers to records or buckets of records (for leaf nodes).The search-keys in a node are ordered K1 < K2 < K3 < . . . < Kn1

Leaf Nodes in B+-TreesFor i = 1, 2, . . ., n1, pointer Pi either points to a file record with search-key value Ki, or to a bucket of pointers to file records, each record having search-key value Ki. Only need bucket structure if search-key does not form a primary key.If Li, Lj are leaf nodes and i < j, Lis search-key values are less than Ljs search-key valuesPn points to next leaf node in search-key orderProperties of a leaf node:

Non-Leaf Nodes in B+-TreesNon leaf nodes form a multi-level sparse index on the leaf nodes. For a non-leaf node with m pointers:All the search-keys in the subtree to which P1 points are less than K1For 2 i n 1, all the search-keys in the subtree to which Pi points have values greater than or equal to Ki1 and less than Km1

Example of a B+-treeB+-tree for account file (n = 3)

Example of B+-treeLeaf nodes must have between 2 and 4 values ((n1)/2 and n 1, with n = 5).Non-leaf nodes other than root must have between 3 and 5 children ((n/2 and n with n =5).Root must have at least 2 children.B+-tree for account file (n - 5)

Updates on B+-Trees: Insertion B+-Tree before and after insertion of Clearview

B-Tree Index FilesNonleaf node pointers Bi are the bucket or file record pointers.

Similar to B+-tree, but B-tree allows search-key values to appear only once; eliminates redundant storage of search keys.Search keys in nonleaf nodes appear nowhere else in the B-tree; an additional pointer field for each search key in a nonleaf node must be included.Generalized B-tree leaf node

B-Tree Index File ExampleB-tree (above) and B+-tree (below) on same data

B-Tree Index Files (Cont.)Advantages of B-Tree indices:May use less tree nodes than a corresponding B+-Tree.Sometimes possible to find search-key value before reaching leaf node.Disadvantages of B-Tree indices:Only small fraction of all search-key values are found early Non-leaf nodes are larger, so fan-out is reduced. Thus B-Trees typically have greater depth than corresponding B+-TreeInsertion and deletion more complicated than in B+-Trees Implementation is harder than B+-Trees.Typically, advantages of B-Trees do not out weigh disadvantages. B-Trees27Type #1: Simple leaf deletion12295227915225669723143Delete 2: Since there are enoughkeys in the node, just delete itAssuming a 5-wayB-Tree, as before...Note when printed: this slide is animatedB-Trees28Type #2: Simple non-leaf deletion1229527915225669723143Delete 52Borrow the predecessoror (in this case) successor56Note when printed: this slide is animatedB-Trees29Type #4: Too few keys in node and its siblings12295679152269723143Delete 72Too few keys!

Join back togetherNote when printed: this slide is animatedB-Trees30Type #4: Too few keys in node and its siblings122979152269563143Note when printed: this slide is animatedB-Trees31Type #3: Enough siblings122979152269563143Delete 22Demote root key andpromote leaf keyNote when printed: this slide is animatedB-Trees32Type #3: Enough siblings1229791531695643Note when printed: this slide is animatedBinary Trees VS. BTreesBinary tree only have 2 children max.

For large files binary tree will be too high because of the limit of children and not enough keys per records.

Btrees disk size can have many children depending on the disk block.

Btrees are more realistic for indexing files because they easily maintain balance and can store many keys in only a few records.

B+ VS. B- TreesB+ trees store redundant search key values because index is smaller.

In a B+ tree, all pointers to data records exists at the leaf-level nodes.

B-tree eliminates redundancy but require additional pointers to do so.

In a B-tree, pointers to data records exist at all levels of the tree.

An Animation of B-tree Algorithm

http://ats.oka.nu/b-tree/b-tree.html

Also watch the Youtube video: B-Treeshttps://www.youtube.com/watch?v=HZRPa0kMOZE