FILE ORGANISATION.pptx

7/27/2019 FILE ORGANISATION.pptx

http://slidepdf.com/reader/full/file-organisationpptx 1/41

FILE ORGANISATION



FILE ORGANIZATION

• The database is stored as a collection of files. Each file is a

sequence of records. A record is a sequence of fields.

• An approach, assume:

record size is fixed

each file has records of one particular type only

different files are used for different relations



FIXED-LENGTH RECORDS

Simple approach:

• Store record i starting from byte n (i – 1), where n is the size of each record.

• Deletion of record i:

move records i + 1, . . ., n to i, . . . , n – 1 OR

move record n to I OR

do not move records, butlink all free records on afree list



VARIABLE-LENGTH RECORDS

Variable-length records arise in database systems in several

ways:

• Storage of multiple record types in a file.

• Record types that allow variable lengths for one or morefields.

• Record types that allow repeating fields (used in some older

data models).



TYPES OF FILE ORGANIZATION

• There are mainly three types of file organizations:

Sequential

Relative

Indexed



SEQUENTIAL FILE ORGANIZATION

• The records in the file are ordered by a search key or primary

key.



• Deletion – use pointer chains

• Insertion –locate the position where the record is to be inserted

if there is free space insert there

if no free space, insert the record in an overflow block

In either case, pointer chain must be updated

• Need to reorganize the file

from time to time to restoresequential order



RELATIVE FILE ORGANIZATION

• Within a relative file are numbered positions, called cells. These

cells are of fixed equal length and are consecutively numbered

from 1 to n, where 1 is the first cell, and n is the last available

cell in the file.

• Records in a relative file are accessed according to cell number.

A cell number is a record's relative record number; its location

relative to the beginning of the file.

•

By specifying relative record numbers, you can directly retrieve,add, or delete records regardless of their locations.



INDEXED FILE ORGANIZATION

• An indexed file contains records ordered by a record key . Each

record contains a field that contains the record key. The record

key uniquely identifies the record.• Indexed organization is similar to relative organization. In place

of relative position of the block, a unique key of the block is

used to find the block.



INDEXING



INDEXING

• Indexing mechanisms are used to optimize access to data

(records) managed in tables. For example, the author catalogue

in a library is a type of index.

• An Index File consists of records (called index entries) of the

form

search key value pointer to block in data table



DENSE INDEX FILES

• Index record appears for every search-key value in the file.



SPARSE INDEX FILES

• Contains index records for only some search-key values.

Applicable when records are sequentially ordered on search-key

• To locate a record with search-key value K we:

Find index record with largest search-key value < K

Search file sequentially starting at the record to which the index record

points



SINGLE LEVEL INDEXING

• A single level index is an auxiliary file that makes it more

efficient to search for a record in the data file.

• The index is called an access path to the file.

• These are of two types:

Primary Index also clustering index

Secondary Index also non-clustering index



PRIMARY INDEX

• A primary index is an ordered file whose entries are of fixed

length with two fields:

<value of primary key; address of data block>• The data file is ordered on the primary key field and requires

primary key for each record to be unique.

• The index file includes one entry for each block.



PROBLEM WITH PRIMARY INDEXES

• Insertion or deletion of record in ordered data file involves:

Making space or deleting space in the data file

And changing the index entries to reflect the new

situation.



SECONDARY INDEX

• A secondary index is an ordered file whose entries are of fixedlength with two fields:

<value of key; address of data block or record pointer >

•

A secondary index provides a secondary means of accessing afile for which some primary access already exists.

• The secondary index may be on a field which is a candidate keyand has a unique value in every record, or a non-key withduplicate values.

• This is used to find records which satisfy the given value for some specific column.



• Often one wants to find all records whose values in a certain

field (which is not the search key of the primary index) satisfy

some condition.

• One can specify a secondary index with an index entry for each

search key value; index entry points to a bucket, which contains

pointers to all the actual records with that particular search key.



MULTI-LEVEL INDEXING

• If primary index is too big to fit in memory, access to records

becomes expensive. In this case multi-level indexing can be

used.

• Multi-level Indexing consists of different levels of indices.

• An outer index table points to inner index tables.

• The inner tables point to the record files.



• Multilevel Index structure.



INDEX UPDATING

Single-level index deletion:

• Dense indices – deletion of search-key: similar to file record

deletion.

• Sparse indices –

If deleted key value exists in the index, the value is

replaced by the next search-key value in the file (in

search-key order).

If the next search-key value already has an index entry,

the entry is deleted instead of being replaced .

Record Deletion



INDEX UPDATING

Single-level index insertion:

• Perform a lookup using the key value from inserted record

• Dense indices – if the search-key value does not appear in

the index, insert it.

• Sparse indices – if index stores an entry for each block of

the file, no change needs to be made to the index unless a

new block is created.

If a new block is created, the first search-key value

appearing in the new block is inserted into the index.

Record Insertion



• Multilevel insertion (as well as deletion) algorithms are simple

extensions of the single-level algorithms.



B TREE

• B-tree is a tree data structure that keeps data sorted and allows

searches, sequential access, insertions, and deletions

in logarithmic time. It is a generalization of a binary search

tree in that a node can have more than two children.

• In B-trees, internal nodes can have a variable number of child

nodes within some pre-defined range. When data are inserted

or removed from a node, its number of child nodes changes. In

order to maintain the pre-defined range, internal nodes may be

joined or split.



• Each internal node of a B-tree will contain a number of keys

which divide its subtrees. For example, if an internal node has 3

child nodes then it must have 2 keys: a1 and a2. All values in theleftmost subtree will be less than a1, all values in the middle

subtree will be between a1 and a2, and all values in the

rightmost subtree will be greater than a2.



B+ TREE

A B+-tree is a tree satisfying the following properties:

• All paths from root to leaf are of the same length

• Each node that is not a root or a leaf has between n/2 andn children.

• A leaf node has between (n –1)/2 and n –1 values

• Special cases:

If the root is not a leaf, it has at least 2 children.

If the root is a leaf (that is, there are no other nodes inthe tree), it can have between 0 and (n –1) values.



• Typical node:

Ki are the search-key values

Pi are pointers to children (for non-leaf nodes) or pointers to

records or buckets of records (for leaf nodes).

• The search-keys in a node are ordered

K 1 < K 2 < K 3 < . . . < K n –1



LEAF NODES

Properties of a leaf node:

• For i = 1, 2, . . ., n –1, pointer P i either points to a file record

with search-key value K i , or to a bucket of pointers to file

records, each record having search-key value K i .

• If Li , L j are leaf nodes and i < j, Li ’s search-key values are

less than L j ’s search-key values

• P n points to next leaf node in search-key order



NON LEAF NODES

Non leaf nodes form a multi-level sparse index on the leaf

nodes. For a non-leaf node with m pointers:

• All the search-keys in the subtree to which P 1 points are less

than K 1

• For 2 i n – 1, all the search-keys in the subtree to which

P i points have values greater than or equal to K i –1 and less

than K i

• All the search-keys in the subtree to which P n points havevalues greater than or equal to K n –1



QUERIES ON B+-TREES

Find all records with a search-key value of k.

• N=root

• Repeat

Examine N for the smallest search-key value > k.

If such a value exists, assume it is K i . Then set N = P i

Otherwise k K n –1. Set N = P n

Until N is a leaf node

• If for some i , key K i = k follow pointer P i to the desired record or

bucket.

• Else no record with search-key value k exists.



B+-TREES UPDATING

• Find the leaf node in which the search-key value would appear

• If the search-key value is already present in the leaf node

Add record to the file

• If the search-key value is not present, then

Add the record to the main file (and create a bucket if necessary)

If there is room in the leaf node, insert (key-value, pointer) pair in

the leaf node

Otherwise, split the node (along with the new (key-value, pointer)

entry).

Insertion



Splitting a Leaf node:

•

Take the n (search-key value, pointer) pairs(including the one being inserted) in sorted order.

Place the first n/2 in the original node, and the rest

in a new node.

• Let the new node be p, and let k be the least key

value in p. Insert (k,p) in the parent of the node

being split.

• If the parent is full, split it and propagate the splitfurther up.



Splitting a non leaf node: when inserting (k,p) into an

already full internal node N

• Copy N to an in-memory area M with space for n+1 pointers

and n keys

•

Insert (k,p) into M• Copy P1,K1, …, K n/2-1,P n/2 from M back into node N

• Copy Pn/2+1,K n/2+1,…,Kn,Pn+1 from M into newly allocated

node N’

• Insert (K n/2,N’) into parent N



B+-TREES UPDATING

• Find the record to be deleted, and remove it from the main file

and from the bucket (if present)

• Remove (search-key value, pointer) from the leaf node if there is

no bucket or if the bucket has become empty

• If the node has too few entries due to the removal, and the

entries in the node and a sibling fit into a single node, then

merge the siblings.

Deletion



• Otherwise, if the node has too few entries due to the removal,

but the entries in the node and a sibling do not fit into a singlenode, then redistribute the pointers:

Redistribute the pointers between the node and a siblingsuch that both have more than the minimum number of entries.

Update the corresponding search-key value in the parent of the node.

• The node deletions may cascade upwards till a node which hasn/2 or more pointers is found.

• If the root node has only one pointer after deletion, it is deletedand the sole child becomes the root.



HASHING



STATIC HASHING

• A bucket is a unit of storage containing one or more records.

• In a hash file organization we obtain the bucket of a record

directly from its search key value using a hash function.

• Hash function h is a function from the set of all search key

values K to the set of all bucket addresses B.



HASH FUNCTIONS

• Worst hash function maps all search key values to the same

bucket; this makes access time proportional to the number of

search key values in the file.

• An ideal hash function is uniform, i.e., each bucket is assigned

the same number of search key values from the set of all

possible values.

• Ideal hash function is random, so each bucket will have the

same number of records assigned to it irrespective of the actual

distribution of search key values in the file.



DEFICIENCIES OF STATIC HASHING

• In static hashing, function h maps search key values to a fixed

set of B of bucket addresses. Databases grow or shrink with

time. If initial number of buckets is too small, and file grows,

performance will degrade.

• If space is allocated for anticipated growth, a significant amount

of space will be wasted initially.

• If database shrinks, again space will be wasted.



DYNAMIC HASHING

• Allows the hash function to be modified dynamically.

• Extendable hashing is one form of dynamic hashing. It splits

and coalesces buckets as database size changes.

•We choose a hash function that generates values over arelatively large range.

• Buckets are created on demand, and all bits of the hash are not

used initially, the no. of bits used changes with change in

database size.

FILE ORGANISATION.pptx

Documents

Transcript of FILE ORGANISATION.pptx