Lecture 2 - Introduction to Storage and Indexingdzeina/courses/epl446/lectures/02.pdfdefinitions in...

2-1EPL446: Advanced Database Systems - Demetris Zeinalipour (University of Cyprus)

EPL446 – Advanced Database Systems

Lecture 2

Overview of Storage and Indexing

Chapt. 8.1-8.3: Ramakrishnan &

Gehrke

Demetris Zeinalipourhttp://www.cs.ucy.ac.cy/~dzeina/courses/epl446

Department of Computer Science

University of Cyprus

http://images.google.com/imgres?imgurl=http://www.mas.ucy.ac.cy/NEWMAS/images/ucy_logo_clean.png&imgrefurl=http://www.mas.ucy.ac.cy/NEWMAS/images/&h=182&w=185&sz=41&tbnid=zwk0cIvSOnVJ3M:&tbnh=94&tbnw=96&hl=en&start=20&prev=/images%3Fq%3Ducy%2Blogo%26svnum%3D10%26hl%3Den%26lr%3D%26rls%3DGGLG,GGLG:2006-01,GGLG:en%26sa%3DN


http://www2.cs.ucy.ac.cy/~dzeina/


Lecture OutlineOverview of Storage and Indexing

• Note: This lecture aims to introduce several important definitions in Storage and Indexing. The subsequent lectures will explain these definitions in more detail

• 8.1) Data on External Storage– Storage Mediums and Storage Hierarchy

– Disk Space Manager and Buffer Manager

• 8.2) File Organization and Indexing– Alternative File Organizations (Sorted,Hash)

– Primary/Secondary Indexes

– Clustered/Unclustered Indexes

• 8.3) Index Data Structures– Tree-based Indexing

– Hash-Based Indexing

Query Optimization

and Execution

Relational Operators

Files and Access Methods

Buffer Management

Disk Space Management

DB



Context of next slides

Query Optimization

and Execution



Buffer Management


DB



Data on External Storage(Δεδομένα ζηη Δεςηεπεύοςζα Μνήμη)

• Α DBMS stores vast amounts of data and the data has to persists across program executions.

• Therefore, data is stored on external storage and fetched into main memory as needed for processing.

• The unit of information that is read and written to a disk is called Page (Σειίδα), e.g., 4KB ή 8KB

• Higher layers of the DBMS view these pages as unified Files (Αξρείν) and can read/write Records (Εγγξαθέο) , Πιεηάδεο) to these files.– Consider a data record (id:4B, name:28B) and a 4096B (4KB)

page size. That would yield ~128 records / page (some bytes go to headers and other auxiliary structures).

• What is the basic performance cost in a DBMS? – I/O (Input/Output): # pages read/write for a given operation.

– Complexity of algorithms in DBMSs is expressed in I/Os



Storage Mediums (Μέζα Αποθήκεςζηρ)

• Disks: Can retrieve a random page at a fixed cost– But reading several consecutive pages is much

cheaper than reading them in random order

• Tapes: Can only read pages in sequence– Cheaper than disks; used for archiving (απσειοθέηηζη)

• Flash Memory (Solid State Disks): Reading data at the speed of main memory, writing is slower.– More expensive than disks; used for applications with

read workloads that require fast random accesses.

Main focus

of DBMSs


http://images.google.com/imgres?imgurl=http://talkelab.ucsd.edu/head-disk/images/disk1b.jpg&imgrefurl=http://talkelab.ucsd.edu/head-disk/&usg=__S4FA9hzonC_2Zp3uSAzif-zYPl8=&h=355&w=512&sz=69&hl=en&start=3&um=1&tbnid=dDbS_rtpKF80_M:&tbnh=91&tbnw=131&prev=/images%3Fq%3Ddisk%26um%3D1%26hl%3Den%26sa%3DN

http://en.wikipedia.org/wiki/File:E-disk_2-5_scsi.jpg

http://images.google.com/imgres?imgurl=http://www.datavis.com/images/NPD/619659017415L.gif&imgrefurl=http://www.datavis.com/webapp/commerce/command/ProductDisplay%3Fprmenbr%3D2000%26prrfnbr%3D433991&h=200&w=200&sz=20&hl=en&start=1&tbnid=kQhWAErxR7zJRM:&tbnh=104&tbnw=104&prev=/images%3Fq%3Dsd%2Bmedia%26svnum%3D10%26hl%3Den%26rls%3DGFRC,GFRC:2007-04,GFRC:en


Storage Hierarchy(Ιεπαπσία Αποθήκεςζηρ)

• Primary Storage (Πξωηνβάζκηα): Volatile (πηηηική)

– Cache (Κρσφή Μνήμη) - random access

– Main memory (RAM) - random access

• Secondary storage (Δεπηεξνβάζκηα) : non-volatile

– Magnetic disk - random access

• DB, Virtual Memory, File System.

• Tertiary storage (Τξηηνβάζκηα): non-volatile

– Optical disk – random access

– Magnetic Tape – sequential access

Speed, Price

E

x

t

e

rn

a

l

Main focus

of DBMS




Query Optimization

and Execution



Buffer Management


DB



Disk Space Manager (DSM) (Δηαρεηξηζηήο Χώξνπ Δίζθνπ)

• DSM: Supports the concept of a page as a unit of data and provides commands to allocate/deallocate, read/write a page to external storage.– Size of Page == Size of Disk Block, in order to support

read/write operations in one I/O operation.

– The higher layers in the DB architecture (i.e., the Buffer

Manager) interact directly with the DSM.

• Other Duties: Keep track of Free Blocks.

– Initially a DB is stored on consecutive disk blocks (when it

acts in its own partition) or inside a file (when it is stored

inside an Operation System file).

– Subsequent deletions might easily create “holes” in that

sequence (either file or disk), thus the DSM needs to track

the free pages.




Query Optimization

and Execution



Buffer Management


DB



Buffer Manager (BM)(Δηαρεηξηζηήο Κξπθήο Μλήκεο)

• BM*: Subsystem that is responsible for loading pages from external storage to the main memory buffer pool

– File & Index layers make calls to the buffer manager.

– Idea: Keep as many blocks (pages) in memory as possible to

reduce disk accesses.

• Replacement Policies

– e.g., LRU (Least-Recently-Used pages … are discarded =>

the oldest are discarded first), MRU (Most-Recently-Used,

the newest are discarded first), LFU (Least-Frequently-Used)

• Prefetching or Double Buffering

– Idea: speed-up access by pre-loading “future”-needed data

– Cons: requires extra main memory; no help if requests are

random * Will be implemented as an assignment




Query Optimization

and Execution



Buffer Management


DB



Alternative File Organizations(Εναλλακηική Οπγάνυζη Απσείυν)

• File organization (Οξγάλωζε Αξρείνπ): Method for arranging a collection of records and supporting the concept of a file.

• In all file organizations the records are accessed by their respective RecordID– Note that a Record ID (RID) usually has the following structure

(PageID, SlotID), where SlotID defines the offset : i) inside PageID at which RID begins; or ii) inside the Slot Directory that resides within page PageID (explained in next lectures)

• Basic Questions– How to store data inside data records (fixed-length records vs.

variable-length records)?

– How to store data-records inside a file (heap file, sorted file, indexed file)?

– How to make a certain File Organization more powerful by complementing them with an Index?

RID_1 … RID_n

Page i



File Organization Types(Τύποι Οπγάνυζηρ Απσείυν)

• Heap files (Απσεία Συπού): Suitable when typical access

is a file scan (ζάξωζε αξρείνπ) retrieving ALL records(* Will be implemented in Minibase)

– Suitable for queries like “SELECT * FROM Employees;”

• Sorted Files (Ταξινομημένα Απσεία): Best if records must

be retrieved in some order, or only a `range’ (δηάζηεκα)

of records is needed.

– Suitable for queries like “SELECT * FROM Employees WHERE

20<age and age<30;”

• Each file organization makes certain operations efficient, but we

are interested in supporting more than one operation!

• To deal with such situations the DBMS builds one or more indexes.

• An index on a file is designed to speed up operations that are not

efficiently supported by the basic organization of that file.



Indexes (Access Methods)(Εςπεηήπια Δεςηεπεύοςζαρ Μνήμηρ)

• An index is a data structure that has index records which point to certain data records.

• An index can optimize certain kinds of retrieval operations (depending on the index).

• Definitions– Index Page (Σειίδεο Επξεηεξίνπ) vs. Data

Pages (Σειίδεο Δεδνκέλωλ): Index Pages store index records to data records. Both reside on disk because we might have many of these pages!

– Data Record (Εγγξαθή Δεδνκέλωλ): Stores the actual data e.g., (59,Mike,3.14) .

– Index Record (Εγγξαθή Επξεηεξίνπ): Stores the RID of another index record or a data record.

Index Page

Data Page

Index Page



B+ Tree Index Overview (Σύνοτη ηος Εςπεηηπίος B+ Tree)

Non-leaf pages have index entries; only used to direct searches. Leaf pages contain data entries K*, and are chained (prev&next) The data records e.g., (59,Mike,3.14) could have been stored

inside he respective data entry. Then the index file would bethe same with the data file. Index File Organization.

33 38 44 46 59 61 69 99

45

Non-leaf Pages

(Index Entries)

(Sorted by search key)

K

1

K

2Kb-1

index entry

59 Mike 3.14

Ind

ex

Fil

e

(In

de

x P

ag

es

)

Da

ta F

ile

(Da

ta P

ag

es

)

Leaf Pages

(Data Entries | K*)

Branching Factor

or Fan-out

(βαθμόρ) = b

Balanced Tree (Ιζνδπγηζκέλν Δέλδξν)Height=∟logbN ,˩ N:#data-entries, b=branching

factor e.g., log10100=2



33 38 44 46 59 61 69 99

45

38 44

99 69

61

33 59 46 45

Index Entry

Simplified Scenario:

Assume that each

index page stores

only 1 record

59 Mike 3.14

Index Entries

(in Index Pages)

Data Pages

Data Entries K*

(in Index Pages)

59,

Mike,

3.14

Data Entry

Empty Pages

Data Pages

Physical Layout (on Disk)

B+ Tree Index Overview (Σύνοτη ηος Εςπεηηπίος B+ Tree)



Example B+ Tree(Παπάδειγμα Χπήζηρ B+ Tree)

• Find 28*? 29*? All > 15* and < 30*

• Insert/delete: Find data entry in leaf, then change it. Need to adjust parent sometimes.

– And change sometimes bubbles up the tree

2* 3*

Root

17

30

14* 16* 33* 34* 38* 39*

135

7*5* 8* 22* 24*

27

27* 29*

Entries <= 17 Entries > 17

Note how data entries

in leaf level are sorted



Structure of Data Entry k*(Δομή ηηρ Καηασώπηζηρ K*)

• In a data entry k* we can store:

– Alternative 1: <k> (Key Value), or

– Alternative 2: <k, RID>(Key Value, Data

record with key value k)>, or

– Alternative 3: <k, [RID1, RID2, …, RIDn]>,

where RIDi is a data record with key value k.

• Choice of alternative for data entries is

orthogonal to the indexing technique used to

locate data entries with a given key value k.

– In particular, ANY of the above alternatives

might be used with ANY index (hash or tree)



Data Entry k* Examples(Παπαδείγμαηα Καηασώπηζηρ k*)

• Alternative 1: <k>

• Alternative 2: <k, RID>

• Alternative 3: <k, [RID,…,RID]>

59, Mike, 3.14 Index Data Entry

59, RID#10

59 Mike 3.14

Index Data Entry

RID#10

Data Record

59, RID#10, RID#61, #RID82

59 Jim 53.14

Index Data Entry

Data Record59 Mike 3.14 59 Chris 33.14

Results in a Index File Organization!

RID#10 RID#61 RID#82



Hash-Based Index Overview (Εςπεηήπια Καηακεπμαηιζμού)

• Similar ideas to the B+tree with regards to pages, records, etc.

• A Hash-Index is a collection of buckets (κάδοσς).

– Buckets contain data entries (Altern. 2,3) | Data Records (Altern. 1)

– Bucket = primary page plus zero or more overflow pages (ζελίδερ

ςπεπσείλιζηρ) … linear/extendible hashing to be studied in next lectures.

• Good for equality selections (επηινγέο ηζόηεηαο)

Index File Organization

Hash Index #1

Data entries !=

Data Records

(Alternative 2)

Data Entries =

Data Records

(Alternative 1)

Hash Index #2

%4 %4



Data Entry k* Remarks(Παπαηηπήζειρ για ηο k*)

• Alternative 1: <k>.

– Results in an Index File Organization! (i.e., there are not real Data Records organized in HeapFile or SortedFile, but everything is stored in the index itself).

– Only 1 index can use Alternative 1. Other indexes must point to the first indexes (like previous example). Otherwise, data records are duplicated, leading to redundant (ζπαηάλη) storage and potential inconsistency (αζςνέπεια).

– Results in “Heavy” Index Structures. • Note that large portions of Indexes should reside in main

memory (for performance reasons)

• If data records are very large, the index itself is very large !

59, Mike, 3.14Index Data Entries: 63, Chris, 33.14 68, Jim, 53.14



Data Entry k* Remarks(Παπαηηπήζειρ για ηο k*)

• Alternatives 2 and 3:

– For Altern. 2,3, data entries are typically smaller than

Data Records. So, the Index is smaller => thus

larger portion of it might reside in Memory => better

performance than Alternative 1.

– Alternative 3 more compact than Alternative 2, but

leads to variable sized data entries (records) [e.g., (59,

[RID#10, RID#61, #RID82]) vs. (59, RID#61)

• We will see that Variable-size records are more difficult to

manage than Fixed-size records.

Index

Data Entry

Data

Record

59, RID#10, RID#61, #RID82

59 Jim 53.1459 Mike 3.14 59 Chris 33.14

RID#10 RID#61 RID#82

59, RID#10

59 Mike 3.14

RID#10



Clustered vs. Unclustered Indexes (Ομαδοποιημένα vs. Μη-Ομαδοποιημένα Εςπεηήπια)

• Clustered Index (Ομαδοποιημένο Εςπεηήπιο): If order

(δηάηαμε) of data records is the same as, or `close to’,

order of data entries, else called unclustered index.• Alternative 1 implies clustered (since datarec same as dateentry)

– Faster Range Queries: Consecutive data records reside on the same page.

• Alternatives 2,3 are usually unclustered.

– Slower Range Queries: Consecutive data records reside on different pages

Data entries

(Index File)

(Data file)

Data Records

Data entries

Data Records

CLUSTERED UNCLUSTERED



Primary vs. Secondary Indexes(Ππυηεύυν vs. Δεστερεύων Εςπεηήπια)

• Primary (Πξωηεύωλ) vs. Secondary (Δευτερεύων)

Index: If search key contains the primary key, then

called primary index, else called secondary index.

• Two data entries are said to be duplicates if they have

the same value for the search key, e.g.,

• Primary Indexes have NO duplicates (the key is unique)

• An Index is Unique if it has NO duplicate data entries

for the search key (i.e., it is either a primary key or a

candidate key), e.g.,

59, RID#10 59, RID#12

59, RID#10 65, RID#12 83, RID#15



Summary(Σύνοτη)

• Many alternative file organizations exist, each

appropriate in some situation.

• If selection queries are frequent, building an

index or sorting the file (less probable) is

important.

– Hash-based indexes only good for equality search.

– Sorted files and tree-based indexes best for range

search; also good for equality search. (Files rarely

kept sorted in practice; B+ tree index is better.)

• Index is a collection of data entries plus a way

(e.g., hash or logarithmic search) to quickly find

entries with given key values.



Summary(Σύνοτη)

• Data entries can be actual data records

<key>,<key, rid> pairs or <key, rid-list> pairs.

– Choice orthogonal to indexing technique used to

locate data entries with a given key value.

• Can have several indexes on a given file of

data records, each with a different search key.

• Indexes can be classified as clustered vs.

unclustered and primary vs. secondary.

– Differences have important consequences for

utility/performance.


Lecture 2 - Introduction to Storage and Indexingdzeina/courses/epl446/lectures/02.pdfdefinitions in...

Documents

Transcript of Lecture 2 - Introduction to Storage and Indexingdzeina/courses/epl446/lectures/02.pdfdefinitions in...