Lecture 2 - Introduction to Storage and Indexingdzeina/courses/epl446/lectures/02.pdfdefinitions in...
Transcript of Lecture 2 - Introduction to Storage and Indexingdzeina/courses/epl446/lectures/02.pdfdefinitions in...
2-1EPL446: Advanced Database Systems - Demetris Zeinalipour (University of Cyprus)
EPL446 – Advanced Database Systems
Lecture 2
Overview of Storage and Indexing
Chapt. 8.1-8.3: Ramakrishnan &
Gehrke
Demetris Zeinalipourhttp://www.cs.ucy.ac.cy/~dzeina/courses/epl446
Department of Computer Science
University of Cyprus
2-2EPL446: Advanced Database Systems - Demetris Zeinalipour (University of Cyprus)
Lecture OutlineOverview of Storage and Indexing
• Note: This lecture aims to introduce several important definitions in Storage and Indexing. The subsequent lectures will explain these definitions in more detail
• 8.1) Data on External Storage– Storage Mediums and Storage Hierarchy
– Disk Space Manager and Buffer Manager
• 8.2) File Organization and Indexing– Alternative File Organizations (Sorted,Hash)
– Primary/Secondary Indexes
– Clustered/Unclustered Indexes
• 8.3) Index Data Structures– Tree-based Indexing
– Hash-Based Indexing
Query Optimization
and Execution
Relational Operators
Files and Access Methods
Buffer Management
Disk Space Management
DB
2-3EPL446: Advanced Database Systems - Demetris Zeinalipour (University of Cyprus)
Context of next slides
Query Optimization
and Execution
Relational Operators
Files and Access Methods
Buffer Management
Disk Space Management
DB
2-4EPL446: Advanced Database Systems - Demetris Zeinalipour (University of Cyprus)
Data on External Storage(Δεδομένα ζηη Δεςηεπεύοςζα Μνήμη)
• Α DBMS stores vast amounts of data and the data has to persists across program executions.
• Therefore, data is stored on external storage and fetched into main memory as needed for processing.
• The unit of information that is read and written to a disk is called Page (Σειίδα), e.g., 4KB ή 8KB
• Higher layers of the DBMS view these pages as unified Files (Αξρείν) and can read/write Records (Εγγξαθέο) , Πιεηάδεο) to these files.– Consider a data record (id:4B, name:28B) and a 4096B (4KB)
page size. That would yield ~128 records / page (some bytes go to headers and other auxiliary structures).
• What is the basic performance cost in a DBMS? – I/O (Input/Output): # pages read/write for a given operation.
– Complexity of algorithms in DBMSs is expressed in I/Os
2-5EPL446: Advanced Database Systems - Demetris Zeinalipour (University of Cyprus)
Storage Mediums (Μέζα Αποθήκεςζηρ)
• Disks: Can retrieve a random page at a fixed cost– But reading several consecutive pages is much
cheaper than reading them in random order
• Tapes: Can only read pages in sequence– Cheaper than disks; used for archiving (απσειοθέηηζη)
• Flash Memory (Solid State Disks): Reading data at the speed of main memory, writing is slower.– More expensive than disks; used for applications with
read workloads that require fast random accesses.
Main focus
of DBMSs
2-6EPL446: Advanced Database Systems - Demetris Zeinalipour (University of Cyprus)
Storage Hierarchy(Ιεπαπσία Αποθήκεςζηρ)
• Primary Storage (Πξωηνβάζκηα): Volatile (πηηηική)
– Cache (Κρσφή Μνήμη) - random access
– Main memory (RAM) - random access
• Secondary storage (Δεπηεξνβάζκηα) : non-volatile
– Magnetic disk - random access
• DB, Virtual Memory, File System.
• Tertiary storage (Τξηηνβάζκηα): non-volatile
– Optical disk – random access
– Magnetic Tape – sequential access
Speed, Price
E
x
t
e
rn
a
l
Main focus
of DBMS
2-7EPL446: Advanced Database Systems - Demetris Zeinalipour (University of Cyprus)
Context of next slides
Query Optimization
and Execution
Relational Operators
Files and Access Methods
Buffer Management
Disk Space Management
DB
2-8EPL446: Advanced Database Systems - Demetris Zeinalipour (University of Cyprus)
Disk Space Manager (DSM) (Δηαρεηξηζηήο Χώξνπ Δίζθνπ)
• DSM: Supports the concept of a page as a unit of data and provides commands to allocate/deallocate, read/write a page to external storage.– Size of Page == Size of Disk Block, in order to support
read/write operations in one I/O operation.
– The higher layers in the DB architecture (i.e., the Buffer
Manager) interact directly with the DSM.
• Other Duties: Keep track of Free Blocks.
– Initially a DB is stored on consecutive disk blocks (when it
acts in its own partition) or inside a file (when it is stored
inside an Operation System file).
– Subsequent deletions might easily create “holes” in that
sequence (either file or disk), thus the DSM needs to track
the free pages.
2-9EPL446: Advanced Database Systems - Demetris Zeinalipour (University of Cyprus)
Context of next slides
Query Optimization
and Execution
Relational Operators
Files and Access Methods
Buffer Management
Disk Space Management
DB
2-10EPL446: Advanced Database Systems - Demetris Zeinalipour (University of Cyprus)
Buffer Manager (BM)(Δηαρεηξηζηήο Κξπθήο Μλήκεο)
• BM*: Subsystem that is responsible for loading pages from external storage to the main memory buffer pool
– File & Index layers make calls to the buffer manager.
– Idea: Keep as many blocks (pages) in memory as possible to
reduce disk accesses.
• Replacement Policies
– e.g., LRU (Least-Recently-Used pages … are discarded =>
the oldest are discarded first), MRU (Most-Recently-Used,
the newest are discarded first), LFU (Least-Frequently-Used)
• Prefetching or Double Buffering
– Idea: speed-up access by pre-loading “future”-needed data
– Cons: requires extra main memory; no help if requests are
random * Will be implemented as an assignment
2-11EPL446: Advanced Database Systems - Demetris Zeinalipour (University of Cyprus)
Context of next slides
Query Optimization
and Execution
Relational Operators
Files and Access Methods
Buffer Management
Disk Space Management
DB
2-12EPL446: Advanced Database Systems - Demetris Zeinalipour (University of Cyprus)
Alternative File Organizations(Εναλλακηική Οπγάνυζη Απσείυν)
• File organization (Οξγάλωζε Αξρείνπ): Method for arranging a collection of records and supporting the concept of a file.
• In all file organizations the records are accessed by their respective RecordID– Note that a Record ID (RID) usually has the following structure
(PageID, SlotID), where SlotID defines the offset : i) inside PageID at which RID begins; or ii) inside the Slot Directory that resides within page PageID (explained in next lectures)
• Basic Questions– How to store data inside data records (fixed-length records vs.
variable-length records)?
– How to store data-records inside a file (heap file, sorted file, indexed file)?
– How to make a certain File Organization more powerful by complementing them with an Index?
RID_1 … RID_n
Page i
2-13EPL446: Advanced Database Systems - Demetris Zeinalipour (University of Cyprus)
File Organization Types(Τύποι Οπγάνυζηρ Απσείυν)
• Heap files (Απσεία Συπού): Suitable when typical access
is a file scan (ζάξωζε αξρείνπ) retrieving ALL records(* Will be implemented in Minibase)
– Suitable for queries like “SELECT * FROM Employees;”
• Sorted Files (Ταξινομημένα Απσεία): Best if records must
be retrieved in some order, or only a `range’ (δηάζηεκα)
of records is needed.
– Suitable for queries like “SELECT * FROM Employees WHERE
20<age and age<30;”
• Each file organization makes certain operations efficient, but we
are interested in supporting more than one operation!
• To deal with such situations the DBMS builds one or more indexes.
• An index on a file is designed to speed up operations that are not
efficiently supported by the basic organization of that file.
2-14EPL446: Advanced Database Systems - Demetris Zeinalipour (University of Cyprus)
Indexes (Access Methods)(Εςπεηήπια Δεςηεπεύοςζαρ Μνήμηρ)
• An index is a data structure that has index records which point to certain data records.
• An index can optimize certain kinds of retrieval operations (depending on the index).
• Definitions– Index Page (Σειίδεο Επξεηεξίνπ) vs. Data
Pages (Σειίδεο Δεδνκέλωλ): Index Pages store index records to data records. Both reside on disk because we might have many of these pages!
– Data Record (Εγγξαθή Δεδνκέλωλ): Stores the actual data e.g., (59,Mike,3.14) .
– Index Record (Εγγξαθή Επξεηεξίνπ): Stores the RID of another index record or a data record.
Index Page
Data Page
Index Page
2-15EPL446: Advanced Database Systems - Demetris Zeinalipour (University of Cyprus)
B+ Tree Index Overview (Σύνοτη ηος Εςπεηηπίος B+ Tree)
Non-leaf pages have index entries; only used to direct searches. Leaf pages contain data entries K*, and are chained (prev&next) The data records e.g., (59,Mike,3.14) could have been stored
inside he respective data entry. Then the index file would bethe same with the data file. Index File Organization.
33 38 44 46 59 61 69 99
45
Non-leaf Pages
(Index Entries)
(Sorted by search key)
K
1
K
2Kb-1
index entry
59 Mike 3.14
Ind
ex
Fil
e
(In
de
x P
ag
es
)
Da
ta F
ile
(Da
ta P
ag
es
)
Leaf Pages
(Data Entries | K*)
Branching Factor
or Fan-out
(βαθμόρ) = b
Balanced Tree (Ιζνδπγηζκέλν Δέλδξν)Height=∟logbN ,˩ N:#data-entries, b=branching
factor e.g., log10100=2
2-16EPL446: Advanced Database Systems - Demetris Zeinalipour (University of Cyprus)
33 38 44 46 59 61 69 99
45
38 44
99 69
61
33 59 46 45
Index Entry
Simplified Scenario:
Assume that each
index page stores
only 1 record
59 Mike 3.14
Index Entries
(in Index Pages)
Data Pages
Data Entries K*
(in Index Pages)
59,
Mike,
3.14
Data Entry
Empty Pages
Data Pages
Physical Layout (on Disk)
B+ Tree Index Overview (Σύνοτη ηος Εςπεηηπίος B+ Tree)
2-17EPL446: Advanced Database Systems - Demetris Zeinalipour (University of Cyprus)
Example B+ Tree(Παπάδειγμα Χπήζηρ B+ Tree)
• Find 28*? 29*? All > 15* and < 30*
• Insert/delete: Find data entry in leaf, then change it. Need to adjust parent sometimes.
– And change sometimes bubbles up the tree
2* 3*
Root
17
30
14* 16* 33* 34* 38* 39*
135
7*5* 8* 22* 24*
27
27* 29*
Entries <= 17 Entries > 17
Note how data entries
in leaf level are sorted
2-18EPL446: Advanced Database Systems - Demetris Zeinalipour (University of Cyprus)
Structure of Data Entry k*(Δομή ηηρ Καηασώπηζηρ K*)
• In a data entry k* we can store:
– Alternative 1: <k> (Key Value), or
– Alternative 2: <k, RID>(Key Value, Data
record with key value k)>, or
– Alternative 3: <k, [RID1, RID2, …, RIDn]>,
where RIDi is a data record with key value k.
• Choice of alternative for data entries is
orthogonal to the indexing technique used to
locate data entries with a given key value k.
– In particular, ANY of the above alternatives
might be used with ANY index (hash or tree)
2-19EPL446: Advanced Database Systems - Demetris Zeinalipour (University of Cyprus)
Data Entry k* Examples(Παπαδείγμαηα Καηασώπηζηρ k*)
• Alternative 1: <k>
• Alternative 2: <k, RID>
• Alternative 3: <k, [RID,…,RID]>
59, Mike, 3.14 Index Data Entry
59, RID#10
59 Mike 3.14
Index Data Entry
RID#10
Data Record
59, RID#10, RID#61, #RID82
59 Jim 53.14
Index Data Entry
Data Record59 Mike 3.14 59 Chris 33.14
Results in a Index File Organization!
RID#10 RID#61 RID#82
2-20EPL446: Advanced Database Systems - Demetris Zeinalipour (University of Cyprus)
Hash-Based Index Overview (Εςπεηήπια Καηακεπμαηιζμού)
• Similar ideas to the B+tree with regards to pages, records, etc.
• A Hash-Index is a collection of buckets (κάδοσς).
– Buckets contain data entries (Altern. 2,3) | Data Records (Altern. 1)
– Bucket = primary page plus zero or more overflow pages (ζελίδερ
ςπεπσείλιζηρ) … linear/extendible hashing to be studied in next lectures.
• Good for equality selections (επηινγέο ηζόηεηαο)
Index File Organization
Hash Index #1
Data entries !=
Data Records
(Alternative 2)
Data Entries =
Data Records
(Alternative 1)
Hash Index #2
%4 %4
2-21EPL446: Advanced Database Systems - Demetris Zeinalipour (University of Cyprus)
Data Entry k* Remarks(Παπαηηπήζειρ για ηο k*)
• Alternative 1: <k>.
– Results in an Index File Organization! (i.e., there are not real Data Records organized in HeapFile or SortedFile, but everything is stored in the index itself).
– Only 1 index can use Alternative 1. Other indexes must point to the first indexes (like previous example). Otherwise, data records are duplicated, leading to redundant (ζπαηάλη) storage and potential inconsistency (αζςνέπεια).
– Results in “Heavy” Index Structures. • Note that large portions of Indexes should reside in main
memory (for performance reasons)
• If data records are very large, the index itself is very large !
59, Mike, 3.14Index Data Entries: 63, Chris, 33.14 68, Jim, 53.14
2-22EPL446: Advanced Database Systems - Demetris Zeinalipour (University of Cyprus)
Data Entry k* Remarks(Παπαηηπήζειρ για ηο k*)
• Alternatives 2 and 3:
– For Altern. 2,3, data entries are typically smaller than
Data Records. So, the Index is smaller => thus
larger portion of it might reside in Memory => better
performance than Alternative 1.
– Alternative 3 more compact than Alternative 2, but
leads to variable sized data entries (records) [e.g., (59,
[RID#10, RID#61, #RID82]) vs. (59, RID#61)
• We will see that Variable-size records are more difficult to
manage than Fixed-size records.
Index
Data Entry
Data
Record
59, RID#10, RID#61, #RID82
59 Jim 53.1459 Mike 3.14 59 Chris 33.14
RID#10 RID#61 RID#82
59, RID#10
59 Mike 3.14
RID#10
2-23EPL446: Advanced Database Systems - Demetris Zeinalipour (University of Cyprus)
Clustered vs. Unclustered Indexes (Ομαδοποιημένα vs. Μη-Ομαδοποιημένα Εςπεηήπια)
• Clustered Index (Ομαδοποιημένο Εςπεηήπιο): If order
(δηάηαμε) of data records is the same as, or `close to’,
order of data entries, else called unclustered index.• Alternative 1 implies clustered (since datarec same as dateentry)
– Faster Range Queries: Consecutive data records reside on the same page.
• Alternatives 2,3 are usually unclustered.
– Slower Range Queries: Consecutive data records reside on different pages
Data entries
(Index File)
(Data file)
Data Records
Data entries
Data Records
CLUSTERED UNCLUSTERED
2-24EPL446: Advanced Database Systems - Demetris Zeinalipour (University of Cyprus)
Primary vs. Secondary Indexes(Ππυηεύυν vs. Δεστερεύων Εςπεηήπια)
• Primary (Πξωηεύωλ) vs. Secondary (Δευτερεύων)
Index: If search key contains the primary key, then
called primary index, else called secondary index.
• Two data entries are said to be duplicates if they have
the same value for the search key, e.g.,
• Primary Indexes have NO duplicates (the key is unique)
• An Index is Unique if it has NO duplicate data entries
for the search key (i.e., it is either a primary key or a
candidate key), e.g.,
59, RID#10 59, RID#12
59, RID#10 65, RID#12 83, RID#15
2-25EPL446: Advanced Database Systems - Demetris Zeinalipour (University of Cyprus)
Summary(Σύνοτη)
• Many alternative file organizations exist, each
appropriate in some situation.
• If selection queries are frequent, building an
index or sorting the file (less probable) is
important.
– Hash-based indexes only good for equality search.
– Sorted files and tree-based indexes best for range
search; also good for equality search. (Files rarely
kept sorted in practice; B+ tree index is better.)
• Index is a collection of data entries plus a way
(e.g., hash or logarithmic search) to quickly find
entries with given key values.
2-26EPL446: Advanced Database Systems - Demetris Zeinalipour (University of Cyprus)
Summary(Σύνοτη)
• Data entries can be actual data records
<key>,<key, rid> pairs or <key, rid-list> pairs.
– Choice orthogonal to indexing technique used to
locate data entries with a given key value.
• Can have several indexes on a given file of
data records, each with a different search key.
• Indexes can be classified as clustered vs.
unclustered and primary vs. secondary.
– Differences have important consequences for
utility/performance.