CS346: Advanced Databases Graham Cormode [email protected] Storage, Files and Indexing.

55
CS346: Advanced Databases Graham Cormode [email protected]. uk Storage, Files and Indexing

Transcript of CS346: Advanced Databases Graham Cormode [email protected] Storage, Files and Indexing.

Page 1: CS346: Advanced Databases Graham Cormode G.Cormode@warwick.ac.uk Storage, Files and Indexing.

CS346: Advanced DatabasesGraham Cormode [email protected]

Storage, Files and Indexing

Page 2: CS346: Advanced Databases Graham Cormode G.Cormode@warwick.ac.uk Storage, Files and Indexing.

Outline

Part 1: ¨ Disk properties and file storage¨ File organizations: ordered, unordered, and hashed¨ Storage topics: RAID and Storage area networks

¨ Chapter: “Disk Storage, Basic File Stuctures and Hashing” in Elmasri and Navathe

Part 2: Indexes

CS346 Advanced Databases2

Page 3: CS346: Advanced Databases Graham Cormode G.Cormode@warwick.ac.uk Storage, Files and Indexing.

Why?

¨ Important to understand how high-level abstractions (databases) map down to low-level concepts (disks, files)– Get a sense of the scale of the quantities involved

(seek times, overhead of inefficient solutions)– Appreciate the difference that smart solutions can bring– Understand where the bottlenecks lie

¨ Give a “bottom-up” perspective on data management– See the whole picture starting from the low-level– Demystify some aspects that can seem opaque

(B-trees, hashing, file organization)– Apply to many areas of computer science (OS, algorithms…)

CS346 Advanced Databases3

Page 4: CS346: Advanced Databases Graham Cormode G.Cormode@warwick.ac.uk Storage, Files and Indexing.

The Memory Hierarchy

CS346 Advanced Databases4

Flash Storage

Page 5: CS346: Advanced Databases Graham Cormode G.Cormode@warwick.ac.uk Storage, Files and Indexing.

Data on Disks

¨ Databases ultimately rely on non-volatile disk storage– Data typically does not fit in (volatile) memory

¨ Physical properties of disks affect performance of the DBMS– Need to understand some basics of disks

¨ A few exceptions to disk-based databases:– Some real-time applications use “in-memory databases”– Some legacy/massive applications use tape storage as well

¨ Different tradeoffs with flash-based storage– Much faster to read, but limits on number of deletions– No major difference between random access and linear scan– “Flash databases” are a niche, but growing area

CS346 Advanced Databases5

Page 6: CS346: Advanced Databases Graham Cormode G.Cormode@warwick.ac.uk Storage, Files and Indexing.

Rotating Disk: 5000 – 10000RPM

¨ Sector size: 0.5KB – 4KB, basic unit of data transfer from disk¨ Seek time: move read head into position, currently ~4ms

– Includes rotational delay: wait for sector to come under read head – Random access: 1/0.004 * 4KB = 1MB/second: quite slow

¨ Track-to-track move, currently ~0.4ms: 10 times faster– Sustained read/write time: 100MB/second (caching can improve)

CS346 Advanced Databases6

Page 7: CS346: Advanced Databases Graham Cormode G.Cormode@warwick.ac.uk Storage, Files and Indexing.

Disk properties: the fundamental contrast¨ Random access is slow, sequential access is fast

– By factors of up to 100s– Want to design storage of data to avoid or minimize random access

and make data access as fast as possible

¨ Buffering can help in multithreaded systems: – Work on other processes while waiting for data to arrive– Double buffering: maintain two buffers of data

work on current buffer of data, while other buffer fills from disk– Maximizes parallel utilization, but doesn’t make my thread faster

CS346 Advanced Databases7

Page 8: CS346: Advanced Databases Graham Cormode G.Cormode@warwick.ac.uk Storage, Files and Indexing.

Records: the basic unit of the database¨ Databases fundamentally composed of records

– Each record describes an object with a number of fields¨ Fields have a type (integer, float, string, time, compound…)

– Fixed or variable length¨ Need to know when one field ends

and the next begins– Field length codes– Field separators (special characters)

¨ Leads to variable length records– How to effectively search through data with variable length records?

CS346 Advanced Databases8

Page 9: CS346: Advanced Databases Graham Cormode G.Cormode@warwick.ac.uk Storage, Files and Indexing.

CS346 Advanced Databases9

Page 10: CS346: Advanced Databases Graham Cormode G.Cormode@warwick.ac.uk Storage, Files and Indexing.

Records and Blocks

¨ Records get stored on disks organized into blocks¨ Small records: pack an integer number into each block

– Leaves some space left over in blocks– Blocking Factor: (average) number of records per block

¨ Large records: may not be effective to leave slack– Records may span across multiple blocks (spanned organization)– May use a pointer at end of block to point to next block

CS346 Advanced Databases10

Page 11: CS346: Advanced Databases Graham Cormode G.Cormode@warwick.ac.uk Storage, Files and Indexing.

Files

¨ A sequence of records is stored as a file– Either using OS file system support, or handled by DBMS

¨ Database requires support for various file operations:– Open file, return new file handler– Scan for the next record that satisfies a search condition– Read the next record from disk into memory– Delete the current record and (eventually) update file on disk– Modify the current record and (eventually) update file on disk– Insert a new record at the current location– Close the file, flush any buffers and postponed operations

¨ Need suitable file layout and indices to allow fast scan operation

CS346 Advanced Databases11

Page 12: CS346: Advanced Databases Graham Cormode G.Cormode@warwick.ac.uk Storage, Files and Indexing.

File organization: unordered

¨ Just dump the records on disk in no particular order

¨ Insert is very efficient: just add to last block

¨ Scan is very inefficient: need to do a linear search– Read half the file on average

¨ Delete could be inefficient:– Read whole file, write it back with deleted record omitted– Instead, just “mark” record as deleted– Periodically remove marked records

CS346 Advanced Databases12

Page 13: CS346: Advanced Databases Graham Cormode G.Cormode@warwick.ac.uk Storage, Files and Indexing.

File organization: ordered

¨ Keep records ordered on some (key) attribute

¨ Can scan through recordsin that order very easily

¨ Can search for a value(or range of values)by binary search– Binary search: log2 b seeks to find desired record out of b blocks– Linear search: b/2 seeks on average to find record

¨ Insertion is rather more expensive and complex to do well– Keep recent records in “overflow buffer” for periodic merge

¨ If modifying the key field, treat as a deletion and an insertionCS346 Advanced Databases13

Page 14: CS346: Advanced Databases Graham Cormode G.Cormode@warwick.ac.uk Storage, Files and Indexing.

CS346 Advanced Databases14

Page 15: CS346: Advanced Databases Graham Cormode G.Cormode@warwick.ac.uk Storage, Files and Indexing.

File organization: hashed

¨ Use hashing to ensurerecords with same keyare grouped together

¨ Arrange file blocks intoM equal sized buckets– Often, 1 block = 1 bucket

¨ Apply hash function to key field to determine its bucket¨ Usual hash table concerns emerge

– Need to deal with collisions, e.g. by open addressing, or chaining– Deletions also get messy, depending on collision method used

CS346 Advanced Databases15

Page 16: CS346: Advanced Databases Graham Cormode G.Cormode@warwick.ac.uk Storage, Files and Indexing.

External hashing

CS346 Advanced Databases16

¨ Don’t store records directly in buckets, store pointers to records– Pointers are small, fit more in a block– “All problems in computer science can be solved by another level of

indirection” – David Wheeler

Page 17: CS346: Advanced Databases Graham Cormode G.Cormode@warwick.ac.uk Storage, Files and Indexing.

External hashing: issues

¨ Aim for 70-90% occupancy of the hash table– Not too much wastage, not too many collisions

¨ Hash function should spread records evenly across buckets– If very skewed distribution, we lose benefits of hashing

¨ Still costly if access to records ordered by key is required– And doesn’t help with accessing records not by key

¨ Main disadvantage: hard to adjust if number of records grows– Need to resize the hash table

¨ What if too many records hash to the same bucket?– Can handle extra records by “chaining” to overflow buckets

CS346 Advanced Databases17

Page 18: CS346: Advanced Databases Graham Cormode G.Cormode@warwick.ac.uk Storage, Files and Indexing.

Hashing: Overflow buckets

CS346 Advanced Databases18

Page 19: CS346: Advanced Databases Graham Cormode G.Cormode@warwick.ac.uk Storage, Files and Indexing.

Extendible hashing

¨ Hashing scheme that allows the hash table to grow and shrink– Avoid wasted space and avoid excessive collisions

¨ Makes use of a directory of bucket addresses– Directory size is a power of two, 2d

– So can double or halve the directory size as needed– The first d bits of the hash value are used to index into the directory

¨ Directory entries point to disk blocks storing records– Contiguous directory entries can point to same disk block– Disk blocks can have a local value of d, d’

¨ Insertions into a block may cause it to overflow and split in two– The directory is then updated accordingly

CS346 Advanced Databases19

Page 20: CS346: Advanced Databases Graham Cormode G.Cormode@warwick.ac.uk Storage, Files and Indexing.

CS346 Advanced Databases20

¨ Extendible hashing example– Some values of d’ less

than global d

Page 21: CS346: Advanced Databases Graham Cormode G.Cormode@warwick.ac.uk Storage, Files and Indexing.

Extendible Hashing: Updating d

¨ If a bucket becomes full, may need to increase d d + 1– Double the size of the directory

¨ Similarly, if all buckets have local d’ < d, can decrease d d – 1– Halve the size of the directory

¨ Other adaptive hashing variants exist– Dynamic hashing: binary tree directory

CS346 Advanced Databases21

Page 22: CS346: Advanced Databases Graham Cormode G.Cormode@warwick.ac.uk Storage, Files and Indexing.

RAID disk technology

¨ RAID originally a way to combine multiple cheap disks for reliability– “Redundant Array of Inexpensive Disks” (1980s)

¨ Now general purpose approach to providing reliability– “Redundant Array of Independent Disks – Sets of different levels of replication

¨ RAID 0: spread data over multiple disks (striping)– Increases throughput, but increases risk of data loss

CS346 Advanced Databases22

Page 23: CS346: Advanced Databases Graham Cormode G.Cormode@warwick.ac.uk Storage, Files and Indexing.

Important RAID levels

¨ RAID 1: duplication of data across multiple disks (mirroring)– Data copied to 2 (or more) disks– Disk reliability measured in “mean time between failures” (MTBF)– Typical MTBF is 100K hours – 1M hours (~ 1 century)– Chance of both disks failing at same time is small– So enough time to recover a copy

¨ RAID 5: block level striping and parity coding spread over disks– Parity coding: allows recovery of 1 missing disk

CS346 Advanced Databases23

1 0 1 1 0 1

Data bits Parity bit

Page 24: CS346: Advanced Databases Graham Cormode G.Cormode@warwick.ac.uk Storage, Files and Indexing.

RAID levels

¨ RAID 6: Reed-Solomon coding allows multiple disk losses¨ Other RAID levels (2, 3, 4) not in common usage

CS346 Advanced Databases24

Page 25: CS346: Advanced Databases Graham Cormode G.Cormode@warwick.ac.uk Storage, Files and Indexing.

Storage Area Networks

¨ Storage Area Networks: virtual disks – Disks attached to “headless” server – Easy to configure, low maintenance overhead

¨ Many advantages to SANs: – Flexible configuration: hot-swap new disks in/out– Can be physically remote from other network elements

Provided on fast (fibre-based) network– Separate storage for server configuration, OS updates etc.

CS346 Advanced Databases25

Page 26: CS346: Advanced Databases Graham Cormode G.Cormode@warwick.ac.uk Storage, Files and Indexing.

Outline

Part 2¨ Indexes: primary and secondary¨ Multilevel indexes and B-trees

¨ Chapter: “Indexing Structure for Files” in Elmasri and Navathe

CS346 Advanced Databases26

Page 27: CS346: Advanced Databases Graham Cormode G.Cormode@warwick.ac.uk Storage, Files and Indexing.

Indexing for Files¨ Chapter: “Indexing Structure for Files” in Elmasri and Navathe

– Move focus from how file is stored on disk to how file is accessed / indexed by the DBMS

¨ Index: an auxiliary file that makes it faster to find certain records– An index is usually for one field of the record (e.g. index by name)– Can have multiple indexes, each for different fields

¨ A basic form of an index is a sorted list of pointers– <field value, pointer to record>, ordered by field value– “An access path” for the indexed field

CS346 Advanced Databases27

Page 28: CS346: Advanced Databases Graham Cormode G.Cormode@warwick.ac.uk Storage, Files and Indexing.

Indexes as access paths

¨ Indexes usually take up much less space than the original file– Each index entry is much smaller than the full record– Just need a field value, and a pointer (few bytes)

¨ Efficient to look up matching records– Binary search on the index, then follow pointer

¨ The index may be dense or sparse– Dense index: contains an entry for every possible search value– Sparse index: contains entries only for some search values

¨ Can have an index on the field that the file is sorted on! Why?– Can be faster to search via index than do binary search on file

CS346 Advanced Databases28

Page 29: CS346: Advanced Databases Graham Cormode G.Cormode@warwick.ac.uk Storage, Files and Indexing.

Primary Index¨ A primary index applies when the file is ordered by a key field¨ A sparse index: one entry for each block of the data file

– An index for the first record in the block (the block anchor)– Can be much fewer entries in index than in the data file

¨ Straightforward to search for a record– Use the index to find the block that the record should be in– Retrieve the block and see if the record is there

¨ Insertion and deletion of records in the main file is a pain!– Almost all the pointers change!

¨ Some standard tricks to mitigate the pain– Buffer updates in an “overflow” file and check against this– Linked list of overflow records for each block as needed– Mark records as deleted, and only purge periodically

CS346 Advanced Databases29

Page 30: CS346: Advanced Databases Graham Cormode G.Cormode@warwick.ac.uk Storage, Files and Indexing.

CS346 Advanced Databases30

Page 31: CS346: Advanced Databases Graham Cormode G.Cormode@warwick.ac.uk Storage, Files and Indexing.

Indexing Example

CS346 Advanced Databases31

¨ Example: Given a data file EMPLOYEE(NAME, SSN, ADDRESS, JOB, SAL, ... )

¨ Suppose that:– record size R=100 bytes (fixed size)– block size B=1024 bytes– file size r=30000 records

¨ Blocking factor Bfr= B / R= 1024 / 100 = 10 records/block

¨ Number of file blocks b= (r/Bfr)= (30000/10)= 3000 blocks

Page 32: CS346: Advanced Databases Graham Cormode G.Cormode@warwick.ac.uk Storage, Files and Indexing.

Indexing Example

¨ For an index on the SSN field, assume the field size VSSN=9 bytes and the record pointer size PR=6 bytes. Then:– index entry size RI=(VSSN+ PR)=(9+6)=15 bytes– index blocking factor BfrI= B / RI = 1024/15 = 68 entries/block– number of index blocks bI = (r/ BfrI)= (3000/68)= 45 blocks– binary search needs log2(bI)= log2(45)= 6 block accesses

[In practice, likely that these 45 blocks would end up in cache]¨ This is compared to an average linear search cost of:

– (b/2) = 30000/2 = 15000 block accesses¨ If the file records are ordered, the binary search cost would be:

– log2b = log23000 = 12 block accesses

CS346 Advanced Databases32

Page 33: CS346: Advanced Databases Graham Cormode G.Cormode@warwick.ac.uk Storage, Files and Indexing.

Clustering Index

¨ Clustering index applies when data is ordered on a non-key field– The field on which data is ordered is called the clustering field– The data file is described as a clustered file – Clustering index is sorted list of <field value, pointer> pairs

¨ Why make a distinction between clustering and primary index? – Field values can appear in many consecutive records– Only one entry in index for each distinct field value

No point having multiple entries– Index points to first data block containing the matching value

¨ Same issues with insertion and deletion as for primary index

CS346 Advanced Databases33

Page 34: CS346: Advanced Databases Graham Cormode G.Cormode@warwick.ac.uk Storage, Files and Indexing.

CS346 Advanced Databases34

Page 35: CS346: Advanced Databases Graham Cormode G.Cormode@warwick.ac.uk Storage, Files and Indexing.

¨ Cluster index where each distinct value is allocated a whole disk block

¨ Linked list if more than one block is needed

CS346 Advanced Databases35

Page 36: CS346: Advanced Databases Graham Cormode G.Cormode@warwick.ac.uk Storage, Files and Indexing.

Secondary indexes

¨ Secondary indexes provide a secondary means of access to data– For when some primary access already exists (e.g. index on key)

¨ A secondary index is on some other field(s)– Either other candidate key fields which are unique for every record– Or non-key field with duplicate values

¨ Secondary index is an ordered file of <field value, pointer> pairs– Pointer can be to a file block, or record within a file– A dense index: must be one pointer per record

¨ Many secondary indexes can be created for a file– Allowing access based on different fields– By contrast, there can be only one primary index

CS346 Advanced Databases36

Page 37: CS346: Advanced Databases Graham Cormode G.Cormode@warwick.ac.uk Storage, Files and Indexing.

¨ Secondary indexwith block pointers

¨ Unique data valuesso structure is simple

CS346 Advanced Databases37

Page 38: CS346: Advanced Databases Graham Cormode G.Cormode@warwick.ac.uk Storage, Files and Indexing.

Secondary index example

¨ Same set up as previous example:r=30000 records of size R=100 bytes, block size B = 1024 bytes

¨ File is stored in 3000 blocks as worked out before¨ Search for a record based on a field of V = 9 bytes

– Linear search would read 1500 blocks on average¨ Secondary index on target attribute (9 + 6) = 15 bytes/record

– Blocking factor for index is 1024/15 = 68 entries per block– Need 30000/68 = 442 blocks to store the (dense) index– Binary search on index takes log2 442 = 9 block accesses– Slightly more than the primary index (why?)

CS346 Advanced Databases38

Page 39: CS346: Advanced Databases Graham Cormode G.Cormode@warwick.ac.uk Storage, Files and Indexing.

Secondary index for non-key, non-ordering ¨ Secondary index for a non-key non-ordering field

– I.e. a field that has duplicate values in many records– Several possible approaches

1. Include duplicate index entries for the same field value (dense)2. Have variable length entries in the index: a list of pointers to all

blocks containing the target value3. Use an extra level of indirection: fixed length index entries point

to list of pointers, arranged as list of disk blocks¨ Option 3 is most commonly used

– All options are painful when data file is subject to insert/deletes

CS346 Advanced Databases39

Page 40: CS346: Advanced Databases Graham Cormode G.Cormode@warwick.ac.uk Storage, Files and Indexing.

CS346 Advanced Databases40

¨ “Option 3” secondary index

Page 41: CS346: Advanced Databases Graham Cormode G.Cormode@warwick.ac.uk Storage, Files and Indexing.

Single Level Indexing Summary

¨ Primary index: on the field that the data is sorted by– Allows faster access than searching the file directly

¨ Secondary index: on any field(s) in the data– Can have multiple secondary indexes– Typically dense

¨ All indexes require extra effort to maintain if the data is subject to frequent updates (insert/delete operations)

CS346 Advanced Databases41

Page 42: CS346: Advanced Databases Graham Cormode G.Cormode@warwick.ac.uk Storage, Files and Indexing.

Multilevel Indexing

¨ The indexes described so far miss a trick: they do binary search– But we can read a block of k index records at a time– Can do a k-way split instead of a 2-way split– Improves cost from log2 N to logk N

¨ Another way to look at it: if index is large, build index on index…– Original index is first level index, then there is second level index– Can repeat, creating third level index, fourth level index…– Until top level of index fits into one disk block– For all realistic file sizes, a constant number of levels is needed

¨ Apply this idea to any index type (primary, secondary, cluster)– Assume first level index has fixed length, distinct valued entries

CS346 Advanced Databases42

Page 43: CS346: Advanced Databases Graham Cormode G.Cormode@warwick.ac.uk Storage, Files and Indexing.

Two-levelindex

CS346 Advanced Databases43

Page 44: CS346: Advanced Databases Graham Cormode G.Cormode@warwick.ac.uk Storage, Files and Indexing.

Example

¨ Convert previous example into a multilevel index– Blocking factor for indexes remains 68– 442 blocks of first level index– Second level index: 442/68 = 7 blocks– Third level index fits in 1 block: stop here!

¨ Hence, need three levels of index: three accesses to find (pointer to) target record

CS346 Advanced Databases44

Page 45: CS346: Advanced Databases Graham Cormode G.Cormode@warwick.ac.uk Storage, Files and Indexing.

Dynamic multilevel indexes

¨ Can we modify our storage of indices to make handling inserts/deletes less painful?

¨ Use tree-structure to directly access data– Keep some space in file blocks to reduce cost of updates

¨ Use the language of trees to describe the structure:

CS346 Advanced Databases45

Page 46: CS346: Advanced Databases Graham Cormode G.Cormode@warwick.ac.uk Storage, Files and Indexing.

Search trees

¨ A search tree: a tree where each node contains at most p-1 search values and p pointers as P1, K1, P2, K2, … Kq-1, Pq, q ≤ p– The values are in order: K1 < K2 < … Kq-1

– Each pointer Pi points to a subtree so that Ki-1 < X ≤ Ki for all keys in subtree

¨ Rules allow efficient search for any key value– Search within the only subtree it can be in at each level

CS346 Advanced Databases46

Page 47: CS346: Advanced Databases Graham Cormode G.Cormode@warwick.ac.uk Storage, Files and Indexing.

Search tree example

¨ Leaf-level entries have the full record¨ Insertion is easier: we can add a new block without having to

rewrite the rest of the tree¨ If tree is unbalanced (some very deep paths), searches are long

– Try to avoid by using rules to avoid tree getting unbalanced– Perform occasional rebalancing or “self-balancing” trees

CS346 Advanced Databases47

Page 48: CS346: Advanced Databases Graham Cormode G.Cormode@warwick.ac.uk Storage, Files and Indexing.

B-trees and B+-trees

¨ B-trees add the constraint that the tree should be balanced– The root to leaf path should be about the same length for all leaves– Avoid wasted space: each node should be between half full and full

¨ B+-tree is a slight modification of B-tree that is now the standard– B-trees: allow pointers to data at all levels of the tree– B+-tree: pointers to data only at the leaf level– B+-tree slightly simpler (fewer cases to deal handle with updates)

¨ The trees can be used for (primary, secondary) multi-level indexes– Updates to data can be reflected in tree easily

¨ These trees are widely used in file systems and database systems– File systems: NTFS [Windows], NSS, XFS, JFS – for directory entries– DBMSs: IBM DB2, Informix, MS SQL Server, Oracle, SQLite

CS346 Advanced Databases48

Page 49: CS346: Advanced Databases Graham Cormode G.Cormode@warwick.ac.uk Storage, Files and Indexing.

B+-tree

CS346 Advanced Databases49

¨ Internal nodes: P1, K1, P2, K2, … Kq-1, Pq, where p/2 < q ≤ p¨ Leaf nodes: K1, Pr1, K2, Pr2, … Kq-1, Prq-1, Pnext, p/2 < q ≤ p– Ki, Pri : Pri points to record with value Ki

– Pnext points to the next leaf node in the tree (for linear access)

Page 50: CS346: Advanced Databases Graham Cormode G.Cormode@warwick.ac.uk Storage, Files and Indexing.

B+-tree: Search¨ Search on a B+-tree is fairly straightforward

– Start at root block– While not at a leaf block

Determine between which values in the block the key falls Follow the relevant pointer to the new block

– Search current leaf block for desired value– If found, follow pointer to retrieve record

CS346 Advanced Databases50

Page 51: CS346: Advanced Databases Graham Cormode G.Cormode@warwick.ac.uk Storage, Files and Indexing.

B-tree: insertion

¨ As with many tree algorithms, insertion is based on search– Start by searching for where the record should be– If room in the leaf block, insert a pointer to the new record– Else, split the leaf block into two, and insert the pointer

¨ Now there are two leaf blocks: need to update parent– Similar process to update parent: may need to split parent– May propagate back to root

¨ Note that we do not explicitly attempt to keep tree balanced– The condition p/2 < q ≤ p ensures that it can’t be too unbalanced

¨ Algorithms fans: condition ensures height is O(log n) for n keys– Worst case time for {insert, delete, search} is O(log n)

CS346 Advanced Databases51

Page 52: CS346: Advanced Databases Graham Cormode G.Cormode@warwick.ac.uk Storage, Files and Indexing.

CS346 Advanced Databases52

Page 53: CS346: Advanced Databases Graham Cormode G.Cormode@warwick.ac.uk Storage, Files and Indexing.

B+-tree: deletion

¨ Essentially the inverse of insertion– Find the record to delete from the B+-tree– Remove the pointer and if block is still large enough, halt– Else, try to redistribute: move entries from sibling block– If can’t redistribute, merge the two siblings– Then delete one pointer from parent and recurse up tree

CS346 Advanced Databases53

Page 54: CS346: Advanced Databases Graham Cormode G.Cormode@warwick.ac.uk Storage, Files and Indexing.

CS346 Advanced Databases54

Page 55: CS346: Advanced Databases Graham Cormode G.Cormode@warwick.ac.uk Storage, Files and Indexing.

Summary

¨ Disk properties and file storage¨ File organizations: ordered, unordered, and hashed¨ Storage topics: RAID and Storage area networks¨ Indexes: primary and secondary¨ Multilevel indexes and B-trees

¨ Chapter: “Disk Storage, Basic File Stuctures and Hashing” in Elmasri and Navathe

¨ Chapter: “Indexing Structure for Files” in Elmasri and Navathe

CS346 Advanced Databases55