FILE ORGANISATION.pptx
-
Upload
nimisha-jith -
Category
Documents
-
view
220 -
download
0
Transcript of FILE ORGANISATION.pptx
7/27/2019 FILE ORGANISATION.pptx
http://slidepdf.com/reader/full/file-organisationpptx 1/41
FILE ORGANISATION
7/27/2019 FILE ORGANISATION.pptx
http://slidepdf.com/reader/full/file-organisationpptx 2/41
FILE ORGANIZATION
• The database is stored as a collection of files. Each file is a
sequence of records. A record is a sequence of fields.
• An approach, assume:
record size is fixed
each file has records of one particular type only
different files are used for different relations
7/27/2019 FILE ORGANISATION.pptx
http://slidepdf.com/reader/full/file-organisationpptx 3/41
FIXED-LENGTH RECORDS
Simple approach:
• Store record i starting from byte n (i – 1), where n is the size of each record.
• Deletion of record i:
move records i + 1, . . ., n to i, . . . , n – 1 OR
move record n to I OR
do not move records, butlink all free records on afree list
7/27/2019 FILE ORGANISATION.pptx
http://slidepdf.com/reader/full/file-organisationpptx 4/41
VARIABLE-LENGTH RECORDS
Variable-length records arise in database systems in several
ways:
• Storage of multiple record types in a file.
• Record types that allow variable lengths for one or morefields.
• Record types that allow repeating fields (used in some older
data models).
7/27/2019 FILE ORGANISATION.pptx
http://slidepdf.com/reader/full/file-organisationpptx 5/41
TYPES OF FILE ORGANIZATION
• There are mainly three types of file organizations:
Sequential
Relative
Indexed
7/27/2019 FILE ORGANISATION.pptx
http://slidepdf.com/reader/full/file-organisationpptx 6/41
SEQUENTIAL FILE ORGANIZATION
• The records in the file are ordered by a search key or primary
key.
7/27/2019 FILE ORGANISATION.pptx
http://slidepdf.com/reader/full/file-organisationpptx 7/41
• Deletion – use pointer chains
• Insertion –locate the position where the record is to be inserted
if there is free space insert there
if no free space, insert the record in an overflow block
In either case, pointer chain must be updated
• Need to reorganize the file
from time to time to restoresequential order
7/27/2019 FILE ORGANISATION.pptx
http://slidepdf.com/reader/full/file-organisationpptx 8/41
RELATIVE FILE ORGANIZATION
• Within a relative file are numbered positions, called cells. These
cells are of fixed equal length and are consecutively numbered
from 1 to n, where 1 is the first cell, and n is the last available
cell in the file.
• Records in a relative file are accessed according to cell number.
A cell number is a record's relative record number; its location
relative to the beginning of the file.
•
By specifying relative record numbers, you can directly retrieve,add, or delete records regardless of their locations.
7/27/2019 FILE ORGANISATION.pptx
http://slidepdf.com/reader/full/file-organisationpptx 9/41
INDEXED FILE ORGANIZATION
• An indexed file contains records ordered by a record key . Each
record contains a field that contains the record key. The record
key uniquely identifies the record.• Indexed organization is similar to relative organization. In place
of relative position of the block, a unique key of the block is
used to find the block.
7/27/2019 FILE ORGANISATION.pptx
http://slidepdf.com/reader/full/file-organisationpptx 10/41
INDEXING
7/27/2019 FILE ORGANISATION.pptx
http://slidepdf.com/reader/full/file-organisationpptx 11/41
INDEXING
• Indexing mechanisms are used to optimize access to data
(records) managed in tables. For example, the author catalogue
in a library is a type of index.
• An Index File consists of records (called index entries) of the
form
search key value pointer to block in data table
7/27/2019 FILE ORGANISATION.pptx
http://slidepdf.com/reader/full/file-organisationpptx 12/41
DENSE INDEX FILES
• Index record appears for every search-key value in the file.
7/27/2019 FILE ORGANISATION.pptx
http://slidepdf.com/reader/full/file-organisationpptx 13/41
SPARSE INDEX FILES
• Contains index records for only some search-key values.
Applicable when records are sequentially ordered on search-key
• To locate a record with search-key value K we:
Find index record with largest search-key value < K
Search file sequentially starting at the record to which the index record
points
7/27/2019 FILE ORGANISATION.pptx
http://slidepdf.com/reader/full/file-organisationpptx 14/41
SINGLE LEVEL INDEXING
• A single level index is an auxiliary file that makes it more
efficient to search for a record in the data file.
• The index is called an access path to the file.
• These are of two types:
Primary Index also clustering index
Secondary Index also non-clustering index
7/27/2019 FILE ORGANISATION.pptx
http://slidepdf.com/reader/full/file-organisationpptx 15/41
PRIMARY INDEX
• A primary index is an ordered file whose entries are of fixed
length with two fields:
<value of primary key; address of data block>• The data file is ordered on the primary key field and requires
primary key for each record to be unique.
• The index file includes one entry for each block.
7/27/2019 FILE ORGANISATION.pptx
http://slidepdf.com/reader/full/file-organisationpptx 16/41
PROBLEM WITH PRIMARY INDEXES
• Insertion or deletion of record in ordered data file involves:
Making space or deleting space in the data file
And changing the index entries to reflect the new
situation.
7/27/2019 FILE ORGANISATION.pptx
http://slidepdf.com/reader/full/file-organisationpptx 17/41
SECONDARY INDEX
• A secondary index is an ordered file whose entries are of fixedlength with two fields:
<value of key; address of data block or record pointer >
•
A secondary index provides a secondary means of accessing afile for which some primary access already exists.
• The secondary index may be on a field which is a candidate keyand has a unique value in every record, or a non-key withduplicate values.
• This is used to find records which satisfy the given value for some specific column.
7/27/2019 FILE ORGANISATION.pptx
http://slidepdf.com/reader/full/file-organisationpptx 18/41
• Often one wants to find all records whose values in a certain
field (which is not the search key of the primary index) satisfy
some condition.
• One can specify a secondary index with an index entry for each
search key value; index entry points to a bucket, which contains
pointers to all the actual records with that particular search key.
7/27/2019 FILE ORGANISATION.pptx
http://slidepdf.com/reader/full/file-organisationpptx 19/41
MULTI-LEVEL INDEXING
• If primary index is too big to fit in memory, access to records
becomes expensive. In this case multi-level indexing can be
used.
• Multi-level Indexing consists of different levels of indices.
• An outer index table points to inner index tables.
• The inner tables point to the record files.
7/27/2019 FILE ORGANISATION.pptx
http://slidepdf.com/reader/full/file-organisationpptx 20/41
• Multilevel Index structure.
7/27/2019 FILE ORGANISATION.pptx
http://slidepdf.com/reader/full/file-organisationpptx 21/41
INDEX UPDATING
Single-level index deletion:
• Dense indices – deletion of search-key: similar to file record
deletion.
• Sparse indices –
If deleted key value exists in the index, the value is
replaced by the next search-key value in the file (in
search-key order).
If the next search-key value already has an index entry,
the entry is deleted instead of being replaced .
Record Deletion
7/27/2019 FILE ORGANISATION.pptx
http://slidepdf.com/reader/full/file-organisationpptx 22/41
INDEX UPDATING
Single-level index insertion:
• Perform a lookup using the key value from inserted record
• Dense indices – if the search-key value does not appear in
the index, insert it.
• Sparse indices – if index stores an entry for each block of
the file, no change needs to be made to the index unless a
new block is created.
If a new block is created, the first search-key value
appearing in the new block is inserted into the index.
Record Insertion
7/27/2019 FILE ORGANISATION.pptx
http://slidepdf.com/reader/full/file-organisationpptx 23/41
• Multilevel insertion (as well as deletion) algorithms are simple
extensions of the single-level algorithms.
7/27/2019 FILE ORGANISATION.pptx
http://slidepdf.com/reader/full/file-organisationpptx 24/41
B TREE
• B-tree is a tree data structure that keeps data sorted and allows
searches, sequential access, insertions, and deletions
in logarithmic time. It is a generalization of a binary search
tree in that a node can have more than two children.
• In B-trees, internal nodes can have a variable number of child
nodes within some pre-defined range. When data are inserted
or removed from a node, its number of child nodes changes. In
order to maintain the pre-defined range, internal nodes may be
joined or split.
7/27/2019 FILE ORGANISATION.pptx
http://slidepdf.com/reader/full/file-organisationpptx 25/41
• Each internal node of a B-tree will contain a number of keys
which divide its subtrees. For example, if an internal node has 3
child nodes then it must have 2 keys: a1 and a2. All values in theleftmost subtree will be less than a1, all values in the middle
subtree will be between a1 and a2, and all values in the
rightmost subtree will be greater than a2.
7/27/2019 FILE ORGANISATION.pptx
http://slidepdf.com/reader/full/file-organisationpptx 26/41
B+ TREE
A B+-tree is a tree satisfying the following properties:
• All paths from root to leaf are of the same length
• Each node that is not a root or a leaf has between n/2 andn children.
• A leaf node has between (n –1)/2 and n –1 values
• Special cases:
If the root is not a leaf, it has at least 2 children.
If the root is a leaf (that is, there are no other nodes inthe tree), it can have between 0 and (n –1) values.
7/27/2019 FILE ORGANISATION.pptx
http://slidepdf.com/reader/full/file-organisationpptx 27/41
• Typical node:
Ki are the search-key values
Pi are pointers to children (for non-leaf nodes) or pointers to
records or buckets of records (for leaf nodes).
• The search-keys in a node are ordered
K 1 < K 2 < K 3 < . . . < K n –1
7/27/2019 FILE ORGANISATION.pptx
http://slidepdf.com/reader/full/file-organisationpptx 28/41
LEAF NODES
Properties of a leaf node:
• For i = 1, 2, . . ., n –1, pointer P i either points to a file record
with search-key value K i , or to a bucket of pointers to file
records, each record having search-key value K i .
• If Li , L j are leaf nodes and i < j, Li ’s search-key values are
less than L j ’s search-key values
• P n points to next leaf node in search-key order
7/27/2019 FILE ORGANISATION.pptx
http://slidepdf.com/reader/full/file-organisationpptx 29/41
7/27/2019 FILE ORGANISATION.pptx
http://slidepdf.com/reader/full/file-organisationpptx 30/41
NON LEAF NODES
Non leaf nodes form a multi-level sparse index on the leaf
nodes. For a non-leaf node with m pointers:
• All the search-keys in the subtree to which P 1 points are less
than K 1
• For 2 i n – 1, all the search-keys in the subtree to which
P i points have values greater than or equal to K i –1 and less
than K i
• All the search-keys in the subtree to which P n points havevalues greater than or equal to K n –1
7/27/2019 FILE ORGANISATION.pptx
http://slidepdf.com/reader/full/file-organisationpptx 31/41
QUERIES ON B+-TREES
Find all records with a search-key value of k.
• N=root
• Repeat
Examine N for the smallest search-key value > k.
If such a value exists, assume it is K i . Then set N = P i
Otherwise k K n –1. Set N = P n
Until N is a leaf node
• If for some i , key K i = k follow pointer P i to the desired record or
bucket.
• Else no record with search-key value k exists.
7/27/2019 FILE ORGANISATION.pptx
http://slidepdf.com/reader/full/file-organisationpptx 32/41
B+-TREES UPDATING
• Find the leaf node in which the search-key value would appear
• If the search-key value is already present in the leaf node
Add record to the file
• If the search-key value is not present, then
Add the record to the main file (and create a bucket if necessary)
If there is room in the leaf node, insert (key-value, pointer) pair in
the leaf node
Otherwise, split the node (along with the new (key-value, pointer)
entry).
Insertion
7/27/2019 FILE ORGANISATION.pptx
http://slidepdf.com/reader/full/file-organisationpptx 33/41
Splitting a Leaf node:
•
Take the n (search-key value, pointer) pairs(including the one being inserted) in sorted order.
Place the first n/2 in the original node, and the rest
in a new node.
• Let the new node be p, and let k be the least key
value in p. Insert (k,p) in the parent of the node
being split.
• If the parent is full, split it and propagate the splitfurther up.
7/27/2019 FILE ORGANISATION.pptx
http://slidepdf.com/reader/full/file-organisationpptx 34/41
Splitting a non leaf node: when inserting (k,p) into an
already full internal node N
• Copy N to an in-memory area M with space for n+1 pointers
and n keys
•
Insert (k,p) into M• Copy P1,K1, …, K n/2-1,P n/2 from M back into node N
• Copy Pn/2+1,K n/2+1,…,Kn,Pn+1 from M into newly allocated
node N’
• Insert (K n/2,N’) into parent N
7/27/2019 FILE ORGANISATION.pptx
http://slidepdf.com/reader/full/file-organisationpptx 35/41
B+-TREES UPDATING
• Find the record to be deleted, and remove it from the main file
and from the bucket (if present)
• Remove (search-key value, pointer) from the leaf node if there is
no bucket or if the bucket has become empty
• If the node has too few entries due to the removal, and the
entries in the node and a sibling fit into a single node, then
merge the siblings.
Deletion
7/27/2019 FILE ORGANISATION.pptx
http://slidepdf.com/reader/full/file-organisationpptx 36/41
• Otherwise, if the node has too few entries due to the removal,
but the entries in the node and a sibling do not fit into a singlenode, then redistribute the pointers:
Redistribute the pointers between the node and a siblingsuch that both have more than the minimum number of entries.
Update the corresponding search-key value in the parent of the node.
• The node deletions may cascade upwards till a node which hasn/2 or more pointers is found.
• If the root node has only one pointer after deletion, it is deletedand the sole child becomes the root.
7/27/2019 FILE ORGANISATION.pptx
http://slidepdf.com/reader/full/file-organisationpptx 37/41
HASHING
7/27/2019 FILE ORGANISATION.pptx
http://slidepdf.com/reader/full/file-organisationpptx 38/41
STATIC HASHING
• A bucket is a unit of storage containing one or more records.
• In a hash file organization we obtain the bucket of a record
directly from its search key value using a hash function.
• Hash function h is a function from the set of all search key
values K to the set of all bucket addresses B.
7/27/2019 FILE ORGANISATION.pptx
http://slidepdf.com/reader/full/file-organisationpptx 39/41
HASH FUNCTIONS
• Worst hash function maps all search key values to the same
bucket; this makes access time proportional to the number of
search key values in the file.
• An ideal hash function is uniform, i.e., each bucket is assigned
the same number of search key values from the set of all
possible values.
• Ideal hash function is random, so each bucket will have the
same number of records assigned to it irrespective of the actual
distribution of search key values in the file.
7/27/2019 FILE ORGANISATION.pptx
http://slidepdf.com/reader/full/file-organisationpptx 40/41
DEFICIENCIES OF STATIC HASHING
• In static hashing, function h maps search key values to a fixed
set of B of bucket addresses. Databases grow or shrink with
time. If initial number of buckets is too small, and file grows,
performance will degrade.
• If space is allocated for anticipated growth, a significant amount
of space will be wasted initially.
• If database shrinks, again space will be wasted.
7/27/2019 FILE ORGANISATION.pptx
http://slidepdf.com/reader/full/file-organisationpptx 41/41
DYNAMIC HASHING
• Allows the hash function to be modified dynamically.
• Extendable hashing is one form of dynamic hashing. It splits
and coalesces buckets as database size changes.
•We choose a hash function that generates values over arelatively large range.
• Buckets are created on demand, and all bits of the hash are not
used initially, the no. of bits used changes with change in
database size.