Database Indexes

39
DATABASE INDEXES Nimesh Shah (nimesh.s) , Amit Bhawnani (amit.b)

description

Database Indexes. Nimesh Shah ( nimesh.s ) , Amit Bhawnani ( amit.b ). Outline. Why Indexes ? Demo Types of Single-level Ordered Indexes Primary Indexes Clustering Indexes Secondary Indexes Multilevel Indexes Dynamic Multilevel Indexes Using B-Trees and B+-Trees - PowerPoint PPT Presentation

Transcript of Database Indexes

Page 1: Database  Indexes

DATABASE INDEXES

Nimesh Shah (nimesh.s) , Amit Bhawnani (amit.b)

Page 2: Database  Indexes

Outline Why Indexes ? Demo Types of Single-level Ordered Indexes

• Primary Indexes• Clustering Indexes• Secondary Indexes

Multilevel Indexes Dynamic Multilevel Indexes Using B-Trees and B+-Trees Indexes on Multiple Keys Disadvantages of Indexes Indexes in SQL SERVER

Clustered Index Non Clustered Index

Page 3: Database  Indexes

Why Indexes? Indexing mechanisms used to speed up access to desired data.

They are much the same as book indexes, providing the database with quick jump points on where to find the full reference.

Search Key - attribute or set of attributes used to look up records in a file.

An index file consists of records (called index entries) of the form

Index files are typically much smaller than the original file Access types supported efficiently. E.g.,

Records with a specified value in the attribute Records with an attribute value falling in a specified range of values.

Search-Key

Pointer

Page 4: Database  Indexes

Demo Given a CUSTOMER table, find all

customers whose name = ‘Justine’ Query :

SELECT * FROM CUSTOMERWHERE name ='Justine‘

Page 5: Database  Indexes

Single Level Indexes - Indexes as Access Paths A single-level index is an auxiliary file that makes it

more efficient to search for a record in the data file.

The index is usually specified on one field of the file (although it could be specified on several fields)

One form of an index is a file of entries <field value, pointer to record>, which is ordered by field value

The index is called an access path on the field.

Page 6: Database  Indexes

Indexes as Access Paths (contd.) The index file usually occupies considerably less disk

blocks than the data file because its entries are much smaller

A binary search on the index yields a pointer to the file record

Indexes can also be characterized as dense or sparse.

A dense index has an index entry for every search key value (and hence every record) in the data file.

A sparse (or nondense) index, on the other hand, has index entries for only some of the search values

Page 7: Database  Indexes

Indexes as Access Paths (contd.)Example: Given the following data file:

EMPLOYEE(NAME, SSN, ADDRESS, JOB, SAL, ... )Suppose that:Record size R = 150 bytesBlock size B = 512 bytesNo of Records r = 30000 records

Then, we get:Blocking factor Bfr = B div R = 512 div 150 = 3 records/blockNumber of file blocks b = (r/Bfr) = (30000/3) = 10000 blocks

For an index on the SSN field, assume the field size VSSN= 9 bytes,Assume the record pointer size PR= 7 bytes. Then:Index entry size RI=(VSSN+ PR)=(9+7)=16 bytesIndex blocking factor BfrI= B div RI= 512 div 16= 32 entries/blockNumber of index blocks b= (r/ BfrI)= (30000/32)= 938 blocksBinary search needs log2bI= log2938= 10 block accesses

This is compared to an average linear search cost of: (b/2)= 30000/2= 15000 block accessesIf the file records are ordered, the binary search cost would be: log2b= log230000= 15 block accesses

Page 8: Database  Indexes

Types of Single-Level Indexes Primay Index

Defined on an ordered data file

The data file is ordered on a key field

Includes one index entry for each block in the data file; the index entry has the key field value for the first record in the block, which is called the block anchor

A similar scheme can use the last record in a block.

A primary index is a nondense (sparse) index, since it includes an entry for each disk block of the data file and the keys of its anchor record rather than for every search value.

Page 9: Database  Indexes

Primary index on

the ordering key field

of an ordered data file

Primary Index (contd.)

Page 10: Database  Indexes

Types of Single-Level Indexes Clustering Index

Defined on an ordered data file

The data file is ordered on a non-key field unlike primary index, which requires that the ordering field of the data file have a distinct value for each record.

Includes one index entry for each distinct value of the field; the index entry points to the first data block that contains records with that field value.

It is another example of nondense index where Insertion and Deletion is relatively straightforward with a clustering index.

Page 11: Database  Indexes

Clustering Index (contd.)A clustering index on the DEPTNUMBER ordering nonkey field of an EMPLOYEE file.

Page 12: Database  Indexes

Clustering Index (contd.)Clustering index with a separate block cluster for each group of records that share the same value for the clustering field.

Page 13: Database  Indexes

Types of Single-Level Indexes Secondary indexes

A secondary index provides a secondary means of accessing a file for which some primary access already exists.

The secondary index may be on a field which is a candidate key and has a unique value in every record, or a nonkey with duplicate values.

The index is an ordered file with two fields. The first field is of the same data type as some nonordering

field of the data file that is an indexing field. The second field is either a block pointer or a record pointer.

There can be many secondary indexes (and hence, indexing fields) for the same file.

Includes one entry for each record in the data file; hence, it is a dense index

Page 14: Database  Indexes

Secondary Indexes (contd.)

A dense secondary index (with block pointers) on a nonordering key field of a file.

Page 15: Database  Indexes

Secondary Indexes (contd.)A secondary index (with recored pointers) on a nonkey field implemented using one level of indirection so that index entries are of fixed length and have unique field values.

Page 16: Database  Indexes

Types of Single-Level Indexes

Page 17: Database  Indexes

Multi-Level Indexes Because a single-level index is an ordered file, we can create a

primary index to the index itself ; in this case, the original index file is called the first-level index and the index to the index is called the second-level index.

We can repeat the process, creating a third, fourth, ..., top level until all entries of the top level fit in one disk block

A multi-level index can be created for any type of first-level index (primary, secondary, clustering) as long as the first-level index consists of more than one disk block

Page 18: Database  Indexes

Multi-Level Indexes (contd.)A two-level primary index resembling ISAM (Indexed Sequential Access Method) organization.

Page 19: Database  Indexes

Multi-Level Indexes (contd.) Such a multi-level index is a form of search tree ;

however, insertion and deletion of new index entries is a severe problem because every level of the index is an ordered file.

A node in a search tree with pointers to subtrees below it.

Page 20: Database  Indexes

Dynamic Multilevel Indexes Using B-Trees and B+ -Trees

Because of the insertion and deletion problem, most multi-level indexes use B-tree or B+-tree data structures, which leave space in each tree node (disk block) to allow for new index entries

These data structures are variations of search trees that allow efficient insertion and deletion of new search values.

In B-Tree and B+-Tree data structures, each node corresponds to a disk block

Each node is kept between half-full and completely full

Page 21: Database  Indexes

Dynamic Multilevel Indexes Using B-Trees and B+ -Trees

An insertion into a node that is not full is quite efficient; if a node is full the insertion causes a split into two nodes

Splitting may propagate to other tree levels

A deletion is quite efficient if a node does not become less than half full

If a deletion causes a node to become less than half full, it must be merged with neighboring nodes

Page 22: Database  Indexes

Difference between B-tree and B+-tree

In a B-tree, pointers to data records exist at all levels of the tree

In a B+-tree, all pointers to data records exists at the leaf-level nodes

A B+-tree can have less levels (or higher capacity of search values) than the corresponding B-tree

B-tree allows search-key values to appear only once; eliminates redundant storage of search keys.

Search keys in nonleaf nodes appear nowhere else in the Btree; an additional pointer field for each search key in a nonleaf node must be included.

Page 23: Database  Indexes

B-tree structures. (a) A node in a B-tree with q – 1 search values. (b) A B-tree of order p = 3. The values were inserted in the order 8, 5, 1, 7, 3, 12, 9, 6.

Page 24: Database  Indexes

The nodes of a B+-tree. (a) Internal node of a B+-tree with q –1 search values. (b) Leaf node of a B+-tree with q – 1 search values and q – 1 data pointers.

Page 25: Database  Indexes

Indexes on Multiple Keys Use multiple indices for certain types of queries. Example:

select account-number from account where branch-name = “Perryridge” and balance = 1000

Possible strategies for processing query using indices on single attributes:1. Use index on branch-name to find accounts with branch

name of Perryridge; test balance = 1000.2. Use index on balance to find accounts with balances of

$1000; test branch-name = “Perryridge”.3. Use branch-name index to find pointers to all records

pertaining to the Perryridge branch. Similarly use index on balance. Take intersection of both sets of pointers obtained.

Page 26: Database  Indexes

Indexes on Multiple Keys All of these alternatives eventually give the correct result.

However, if the set of records that meet each condition (branch-name = “Perryridge” OR balance = 1000) individually are large, yet only a few records satisfy the combined condition, then none of the above is a very efficient technique for the given search request.

Suppose we have an index on combined search-key (branch-name, balance). With the where clause

where branch-name = “Perryridge” and balance = 1000 the index on the combined search-key will fetch only records that satisfy both conditions.

Can also efficiently handle where branch-name = “Perryridge” and balance < 1000

But cannot efficiently handle where branch-name < “Perryridge” and balance = 1000. May fetch many records that satisfy the first but not the second condition.

Page 27: Database  Indexes

Disadvantages of Indexes Indexes require additional storage space

on disk, just as the index in a book require additional pages.

Too many indexes can actually slow your database down. Each time a page or database row is updated or removed, the reference or index also has to be updated

Page 28: Database  Indexes

Indexes in Microsoft SQL Server

Indexes are organized as B-trees. Each page in an index B-tree is called an index node. The top node of the B-tree is called the root node. The bottom level of nodes in the index is called the leaf

nodes. Any index levels between the root and the leaf nodes

are collectively known as intermediate levels. The pages in each level of the index are linked in a

doubly-linked list. SQL Server primarily supports 2 types of indexes

CLUSTERED Index NON-CLUSTERED Index

Page 29: Database  Indexes

Clustered Index A clustered index is a special type of index that reorders

the way records in the table are physically stored. The root and intermediate level nodes contain index

pages holding index rows Each index row contains a key value and a pointer to

either an intermediate level page in the B-tree, or a data row in the leaf level of the index.

Leaf node contains the actual data pages. The data row of the table are sorted and stored in the

table based on their clustered index key (i.e. based on the index column(s)).

You can have only one clustered index per table.

Page 30: Database  Indexes

Clustered Index

Page 31: Database  Indexes

Clustered Index CREATE CLUSTERED INDEX <index

name> ON <table name> (<column name list>)

Eg: CREATE CLUSTERED INDEX

IX_CUSTOMER_id ON CUSTOMER(id)

Page 32: Database  Indexes

Clustered Index Potential candidates for the clustered index:

Columns with a number of duplicate values that are searched frequently, for example, WHERE last_name = ‘Smith’.

Columns that are often specified in the ORDER BY / GROUP BY clause

Columns that are often searched for within a range of values, for example, WHERE price between $10 and $20.

Columns, other than the primary key, that are frequently used in join clauses.

Queries that return large result sets.

Page 33: Database  Indexes

Clustered Index Clustered indexes are not a good choice for:

Columns that undergo frequent changes: This results in the entire row moving (because SQL Server must keep the data values of a row in physical order). This is an important consideration in high-volume transaction processing systems where data tends to be volatile.

Wide keys: The key values from the clustered index are used by all nonclustered indexes as lookup keys and therefore are stored in each nonclustered index leaf entry.

 

Page 34: Database  Indexes

Non-Clustered Index A nonclustered index is a special type of index in which the

logical order of the index does not match the physical stored order of the rows on disk.

Leaf node contains index pages instead of data pages. Each index row in the nonclustered index contains the

nonclustered key value and a row locator. This locator points to the data row in the clustered index or heap having the key value Heap: The row locator is a pointer to the row. Row locator is

built based on the following. ROW ID (RowLocator)= file identifier + page number + row number on the page

Clustered Index : Row locator is the clustered index key for the row

You can have up to 249 Non Clustered Index per table.

Page 35: Database  Indexes

Non-Clustered Index

Page 36: Database  Indexes

Non-Clustered Index CREATE [NONCLUSTERED] INDEX

<index name> ON <table name> (<column name list>)

Eg: CREATE NONCLUSTERED INDEX

IX_CUSTOMER_name ON CUSTOMER(name)

Page 37: Database  Indexes

Non-Clustered Index Potential candidates for nonclustered

indexes for your environment: Columns referenced in SARGs or join clauses

that have a relatively high selectivity(the density value is low).

Columns referenced in both the WHERE clause and the ORDER BY clause

Nonclustered indexes are useful for single-row lookups, joins, queries on columns that are highly selective, or queries with small range retrievals.

Page 38: Database  Indexes

Non-Clustered Index Index with Included Columns

Functionality of nonclustered indexes can be extended by adding nonkey columns to the leaf level of the nonclustered index

An index with included nonkey columns can significantly improve query performance when all columns in the query are included in the index either as key or nonkey columns.

Page 39: Database  Indexes

Questions

?