Indexing

12
Indexing By: Arnold Mesa

description

Indexing. By: Arnold Mesa. Indexing. You can think of an index to a file like a catalogue to a library. There are two kinds. Ordered Indices - sorted ordering of the values. - PowerPoint PPT Presentation

Transcript of Indexing

Page 1: Indexing

IndexingBy: Arnold Mesa

Page 2: Indexing

Indexing

You can think of an index to a file like a catalogue to a library

Page 3: Indexing

There are two kinds...

Ordered Indices - sorted ordering of the values.

Hash Indices - a uniform distribution of values across a range of buckets. The distribution is based on a hash function.

Page 4: Indexing

Key Concepts

Access Types - types of access that are supported efficiently

Access Time - time it takes to access a particular data item

Insertion Time - time it takes to insert a data item Deletion Time - time it takes to delete a data item Space Overhead - additional space occupied by an index

structure

Page 5: Indexing

There are two kinds of ordered indices

– Dense Index - An index record appears for every search-key value in the file. The index record contains the search-key value and a pointer to the first data record. The rest of the records with the same search key-value would be sequentially stored after the first record.

– Sparse Index - An index record appears for only some of the search key values. So you have a smaller number of index records. Each index contains a search key and a pointer to the first record, as with the dense index.

Page 6: Indexing

234 Hotel Sofitel A-212

321 Hilton B-321

389 Hilton C-002

396 Hilton A-322

112 Westin C-034

253 Westin B-219

501 Marriot B-069

532 Marriot C-304

221 The Ritz A-007

Hotel Sofitel

Hilton

Westin

Marriot

The Ritz

Dense Index

Page 7: Indexing

234 Hotel Sofitel A-212

321 Hilton B-321

389 Hilton C-002

396 Hilton A-322

112 Westin C-034

253 Westin B-219

501 Marriot B-069

532 Marriot C-304

221 The Ritz A-007

Hotel Sofitel

Westin

The Ritz

Sparse Tree

Page 8: Indexing

234 Hotel Sofitel A-212

321 Hilton B-321

389 Hilton C-002

396 Hilton A-322

112 Westin C-034

253 Westin B-219

501 Marriot B-069

532 Marriot C-304

221 The Ritz A-007

Hotel Sofitel

Westin

The Ritz

Suppose we want to find the Marriot #532...

Page 9: Indexing

Efficiency Issues

Even if we use a sparse index, the index itself may become too large for efficient processing

If an index is sufficiently small to be kept in main memory, the search time would be low

If the index is large that is kept on disk, a search may require several disk block reads

Page 10: Indexing

How to deal ...

With a large index we should construct a sparse index on the primary index.

234 Hotel Sofitel A-212

321 Hilton B-321

389 Hilton C-002

396 Hilton A-322

112 Westin C-034

253 Westin B-219

501 Marriot B-069

532 Marriot C-304

221 The Ritz A-007

Hotel Sofitel

Hilton

WestinMarriot

The Ritz

Hotel Sofitel

Marriot

Marriot

Page 11: Indexing

Is this looking familiar? Remember B+-trees

– B+ trees are said to be of m-order. A number of the designers choosing.– Each leaf has between m and [m-2] children.– All data is stored at the leaf level.– All leaves are at the same depth

Page 12: Indexing

Example?