FILE ORGANISATION.pptx

41
FILE ORGANISATION

Transcript of FILE ORGANISATION.pptx

Page 1: FILE ORGANISATION.pptx

7/27/2019 FILE ORGANISATION.pptx

http://slidepdf.com/reader/full/file-organisationpptx 1/41

FILE ORGANISATION

Page 2: FILE ORGANISATION.pptx

7/27/2019 FILE ORGANISATION.pptx

http://slidepdf.com/reader/full/file-organisationpptx 2/41

FILE ORGANIZATION

• The database is stored as a collection of files. Each file is a

sequence of records.  A record is a sequence of fields.

•  An approach, assume:

record size is fixed

each file has records of one particular type only

different files are used for different relations

Page 3: FILE ORGANISATION.pptx

7/27/2019 FILE ORGANISATION.pptx

http://slidepdf.com/reader/full/file-organisationpptx 3/41

FIXED-LENGTH RECORDS

Simple approach:

• Store record i starting from byte n   (i  – 1), where n is the size of each record.

• Deletion of record i:

move records i + 1, . . ., n to i, . . . , n  – 1 OR

move record n to I OR  

do not move records, butlink all free records on afree list  

Page 4: FILE ORGANISATION.pptx

7/27/2019 FILE ORGANISATION.pptx

http://slidepdf.com/reader/full/file-organisationpptx 4/41

VARIABLE-LENGTH RECORDS

Variable-length records arise in database systems in several

ways:

• Storage of multiple record types in a file.

• Record types that allow variable lengths for one or morefields.

• Record types that allow repeating fields (used in some older 

data models).

Page 5: FILE ORGANISATION.pptx

7/27/2019 FILE ORGANISATION.pptx

http://slidepdf.com/reader/full/file-organisationpptx 5/41

TYPES OF FILE ORGANIZATION

• There are mainly three types of file organizations:

Sequential

Relative

Indexed

Page 6: FILE ORGANISATION.pptx

7/27/2019 FILE ORGANISATION.pptx

http://slidepdf.com/reader/full/file-organisationpptx 6/41

SEQUENTIAL FILE ORGANIZATION

• The records in the file are ordered by a search key or primary

key.

Page 7: FILE ORGANISATION.pptx

7/27/2019 FILE ORGANISATION.pptx

http://slidepdf.com/reader/full/file-organisationpptx 7/41

• Deletion – use pointer chains

• Insertion –locate the position where the record is to be inserted

if there is free space insert there

if no free space, insert the record in an overflow block

In either case, pointer chain must be updated

• Need to reorganize the file

from time to time to restoresequential order 

Page 8: FILE ORGANISATION.pptx

7/27/2019 FILE ORGANISATION.pptx

http://slidepdf.com/reader/full/file-organisationpptx 8/41

RELATIVE FILE ORGANIZATION

• Within a relative file are numbered positions, called cells. These

cells are of fixed equal length and are consecutively numbered

from 1 to n, where 1 is the first cell, and n is the last available

cell in the file.

• Records in a relative file are accessed according to cell number.

 A cell number is a record's relative record number; its location

relative to the beginning of the file.

By specifying relative record numbers, you can directly retrieve,add, or delete records regardless of their locations.

Page 9: FILE ORGANISATION.pptx

7/27/2019 FILE ORGANISATION.pptx

http://slidepdf.com/reader/full/file-organisationpptx 9/41

INDEXED FILE ORGANIZATION

•  An indexed file contains records ordered by a record key . Each

record contains a field that contains the record key. The record

key uniquely identifies the record.• Indexed organization is similar to relative organization. In place

of relative position of the block, a unique key of the block is

used to find the block.

Page 10: FILE ORGANISATION.pptx

7/27/2019 FILE ORGANISATION.pptx

http://slidepdf.com/reader/full/file-organisationpptx 10/41

INDEXING

Page 11: FILE ORGANISATION.pptx

7/27/2019 FILE ORGANISATION.pptx

http://slidepdf.com/reader/full/file-organisationpptx 11/41

INDEXING

• Indexing mechanisms are used to optimize access to data

(records) managed in tables. For example, the author catalogue

in a library is a type of index.

•  An Index File consists of records (called index entries) of the

form

search key value pointer to block in data table

Page 12: FILE ORGANISATION.pptx

7/27/2019 FILE ORGANISATION.pptx

http://slidepdf.com/reader/full/file-organisationpptx 12/41

DENSE INDEX FILES

• Index record appears for every search-key value in the file.

Page 13: FILE ORGANISATION.pptx

7/27/2019 FILE ORGANISATION.pptx

http://slidepdf.com/reader/full/file-organisationpptx 13/41

SPARSE INDEX FILES

• Contains index records for only some search-key values.

 Applicable when records are sequentially ordered on search-key

• To locate a record with search-key value K we:

Find index record with largest search-key value < K  

Search file sequentially starting at the record to which the index record

points

Page 14: FILE ORGANISATION.pptx

7/27/2019 FILE ORGANISATION.pptx

http://slidepdf.com/reader/full/file-organisationpptx 14/41

SINGLE LEVEL INDEXING

•  A single level index is an auxiliary file that makes it more

efficient to search for a record in the data file.

• The index is called an access path to the file.

• These are of two types:

Primary Index also clustering index

Secondary Index also non-clustering index

Page 15: FILE ORGANISATION.pptx

7/27/2019 FILE ORGANISATION.pptx

http://slidepdf.com/reader/full/file-organisationpptx 15/41

PRIMARY INDEX

•  A primary index is an ordered file whose entries are of fixed

length with two fields:

<value of primary key; address of data block>• The data file is ordered on the primary key field and requires

primary key for each record to be unique.

• The index file includes one entry for each block.

Page 16: FILE ORGANISATION.pptx

7/27/2019 FILE ORGANISATION.pptx

http://slidepdf.com/reader/full/file-organisationpptx 16/41

PROBLEM WITH PRIMARY INDEXES

• Insertion or deletion of record in ordered data file involves:

Making space or deleting space in the data file

 And changing the index entries to reflect the new

situation.

Page 17: FILE ORGANISATION.pptx

7/27/2019 FILE ORGANISATION.pptx

http://slidepdf.com/reader/full/file-organisationpptx 17/41

SECONDARY INDEX

•  A secondary index is an ordered file whose entries are of fixedlength with two fields:

<value of key; address of data block or record pointer >

 A secondary index provides a secondary means of accessing afile for which some primary access already exists.

• The secondary index may be on a field which is a candidate keyand has a unique value in every record, or a non-key withduplicate values.

• This is used to find records which satisfy the given value for some specific column.

Page 18: FILE ORGANISATION.pptx

7/27/2019 FILE ORGANISATION.pptx

http://slidepdf.com/reader/full/file-organisationpptx 18/41

• Often one wants to find all records whose values in a certain

field (which is not the search key of the primary index) satisfy

some condition.

• One can specify a secondary index with an index entry for each

search key value; index entry points to a bucket, which contains

pointers to all the actual records with that particular search key.

Page 19: FILE ORGANISATION.pptx

7/27/2019 FILE ORGANISATION.pptx

http://slidepdf.com/reader/full/file-organisationpptx 19/41

MULTI-LEVEL INDEXING

• If primary index is too big to fit in memory, access to records

becomes expensive. In this case multi-level indexing can be

used.

• Multi-level Indexing consists of different levels of indices.

•  An outer index table points to inner index tables.

• The inner tables point to the record files.

Page 20: FILE ORGANISATION.pptx

7/27/2019 FILE ORGANISATION.pptx

http://slidepdf.com/reader/full/file-organisationpptx 20/41

• Multilevel Index structure.

Page 21: FILE ORGANISATION.pptx

7/27/2019 FILE ORGANISATION.pptx

http://slidepdf.com/reader/full/file-organisationpptx 21/41

INDEX UPDATING

Single-level index deletion:

• Dense indices – deletion of search-key: similar to file record

deletion.

• Sparse indices – 

If deleted key value exists in the index, the value is

replaced by the next search-key value in the file (in

search-key order).

If the next search-key value already has an index entry,

the entry is deleted instead of being replaced .

Record Deletion

Page 22: FILE ORGANISATION.pptx

7/27/2019 FILE ORGANISATION.pptx

http://slidepdf.com/reader/full/file-organisationpptx 22/41

INDEX UPDATING 

Single-level index insertion:

• Perform a lookup using the key value from inserted record

• Dense indices – if the search-key value does not appear in

the index, insert it.

• Sparse indices – if index stores an entry for each block of 

the file, no change needs to be made to the index unless a

new block is created.

If a new block is created, the first search-key value

appearing in the new block is inserted into the index.

Record Insertion

Page 23: FILE ORGANISATION.pptx

7/27/2019 FILE ORGANISATION.pptx

http://slidepdf.com/reader/full/file-organisationpptx 23/41

• Multilevel insertion (as well as deletion) algorithms are simple

extensions of the single-level algorithms.

Page 24: FILE ORGANISATION.pptx

7/27/2019 FILE ORGANISATION.pptx

http://slidepdf.com/reader/full/file-organisationpptx 24/41

B TREE

• B-tree is a tree data structure that keeps data sorted and allows

searches, sequential access, insertions, and deletions

in logarithmic time. It is a generalization of a binary search

tree in that a node can have more than two children.

• In B-trees, internal nodes can have a variable number of child

nodes within some pre-defined range. When data are inserted

or removed from a node, its number of child nodes changes. In

order to maintain the pre-defined range, internal nodes may be

 joined or split.

Page 25: FILE ORGANISATION.pptx

7/27/2019 FILE ORGANISATION.pptx

http://slidepdf.com/reader/full/file-organisationpptx 25/41

• Each internal node of a B-tree will contain a number of keys

which divide its subtrees. For example, if an internal node has 3

child nodes then it must have 2 keys: a1 and a2. All values in theleftmost subtree will be less than a1, all values in the middle

subtree will be between a1 and a2, and all values in the

rightmost subtree will be greater than a2.

Page 26: FILE ORGANISATION.pptx

7/27/2019 FILE ORGANISATION.pptx

http://slidepdf.com/reader/full/file-organisationpptx 26/41

B+ TREE

 A B+-tree is a tree satisfying the following properties:

•  All paths from root to leaf are of the same length

• Each node that is not a root or a leaf has between n/2 andn children.

•  A leaf node has between (n –1)/2 and n –1 values

• Special cases:

If the root is not a leaf, it has at least 2 children.

If the root is a leaf (that is, there are no other nodes inthe tree), it can have between 0 and (n –1) values.

Page 27: FILE ORGANISATION.pptx

7/27/2019 FILE ORGANISATION.pptx

http://slidepdf.com/reader/full/file-organisationpptx 27/41

• Typical node:

Ki are the search-key values

Pi are pointers to children (for non-leaf nodes) or pointers to

records or buckets of records (for leaf nodes).

• The search-keys in a node are ordered

K 1 < K 2 < K 3 < . . . < K n –1 

Page 28: FILE ORGANISATION.pptx

7/27/2019 FILE ORGANISATION.pptx

http://slidepdf.com/reader/full/file-organisationpptx 28/41

LEAF NODES

Properties of a leaf node:

• For i = 1, 2, . . ., n –1, pointer P i  either points to a file record

with search-key value K i , or to a bucket of pointers to file

records, each record having search-key value K i . 

• If Li , L j  are leaf nodes and i < j, Li ’s search-key values are

less than L j ’s search-key values

• P n points to next leaf node in search-key order 

Page 29: FILE ORGANISATION.pptx

7/27/2019 FILE ORGANISATION.pptx

http://slidepdf.com/reader/full/file-organisationpptx 29/41

 

Page 30: FILE ORGANISATION.pptx

7/27/2019 FILE ORGANISATION.pptx

http://slidepdf.com/reader/full/file-organisationpptx 30/41

 NON LEAF NODES

Non leaf nodes form a multi-level sparse index on the leaf 

nodes. For a non-leaf node with m pointers:

•  All the search-keys in the subtree to which P 1 points are less

than K 1 

• For 2  i  n – 1, all the search-keys in the subtree to which

P i  points have values greater than or equal to K i  –1 and less

than K i 

•  All the search-keys in the subtree to which P n points havevalues greater than or equal to K n –1

Page 31: FILE ORGANISATION.pptx

7/27/2019 FILE ORGANISATION.pptx

http://slidepdf.com/reader/full/file-organisationpptx 31/41

QUERIES ON B+-TREES

Find all records with a search-key value of k.

• N=root 

• Repeat

Examine N for the smallest search-key value > k. 

If such a value exists, assume it is K i . Then set N = P i  

Otherwise k   K n –1. Set N = P n 

Until N is a leaf node

• If for some i , key K i = k  follow pointer P i   to the desired record or 

bucket.

• Else no record with search-key value k exists. 

Page 32: FILE ORGANISATION.pptx

7/27/2019 FILE ORGANISATION.pptx

http://slidepdf.com/reader/full/file-organisationpptx 32/41

B+-TREES UPDATING 

• Find the leaf node in which the search-key value would appear 

• If the search-key value is already present in the leaf node

 Add record to the file

• If the search-key value is not present, then

 Add the record to the main file (and create a bucket if necessary)

If there is room in the leaf node, insert (key-value, pointer) pair in

the leaf node

Otherwise, split the node (along with the new (key-value, pointer)

entry).

Insertion

Page 33: FILE ORGANISATION.pptx

7/27/2019 FILE ORGANISATION.pptx

http://slidepdf.com/reader/full/file-organisationpptx 33/41

Splitting a Leaf node:

Take the n (search-key value, pointer) pairs(including the one being inserted) in sorted order.

Place the first n/2 in the original node, and the rest

in a new node.

• Let the new node be p, and let k be the least key

value in p. Insert (k,p) in the parent of the node

being split.

• If the parent is full, split it and propagate the splitfurther up.

Page 34: FILE ORGANISATION.pptx

7/27/2019 FILE ORGANISATION.pptx

http://slidepdf.com/reader/full/file-organisationpptx 34/41

Splitting a non leaf node: when inserting (k,p) into an

already full internal node N

• Copy N to an in-memory area M with space for n+1 pointers

and n keys

Insert (k,p) into M• Copy P1,K1, …, K n/2-1,P n/2 from M back into node N

• Copy Pn/2+1,K n/2+1,…,Kn,Pn+1 from M into newly allocated

node N’ 

• Insert (K n/2,N’) into parent N 

Page 35: FILE ORGANISATION.pptx

7/27/2019 FILE ORGANISATION.pptx

http://slidepdf.com/reader/full/file-organisationpptx 35/41

B+-TREES UPDATING

• Find the record to be deleted, and remove it from the main file

and from the bucket (if present)

• Remove (search-key value, pointer) from the leaf node if there is

no bucket or if the bucket has become empty

• If the node has too few entries due to the removal, and the

entries in the node and a sibling fit into a single node, then

merge the siblings.

Deletion

Page 36: FILE ORGANISATION.pptx

7/27/2019 FILE ORGANISATION.pptx

http://slidepdf.com/reader/full/file-organisationpptx 36/41

• Otherwise, if the node has too few entries due to the removal,

but the entries in the node and a sibling do not fit into a singlenode, then redistribute the pointers:

Redistribute the pointers between the node and a siblingsuch that both have more than the minimum number of entries.

Update the corresponding search-key value in the parent of the node.

• The node deletions may cascade upwards till a node which hasn/2  or more pointers is found.

• If the root node has only one pointer after deletion, it is deletedand the sole child becomes the root.

Page 37: FILE ORGANISATION.pptx

7/27/2019 FILE ORGANISATION.pptx

http://slidepdf.com/reader/full/file-organisationpptx 37/41

HASHING

Page 38: FILE ORGANISATION.pptx

7/27/2019 FILE ORGANISATION.pptx

http://slidepdf.com/reader/full/file-organisationpptx 38/41

STATIC HASHING

•  A bucket is a unit of storage containing one or more records.

• In a hash file organization we obtain the bucket of a record

directly from its search key value using a hash function.

• Hash function h is a function from the set of all search key

values K to the set of all bucket addresses B.

Page 39: FILE ORGANISATION.pptx

7/27/2019 FILE ORGANISATION.pptx

http://slidepdf.com/reader/full/file-organisationpptx 39/41

HASH FUNCTIONS

• Worst hash function maps all search key values to the same

bucket; this makes access time proportional to the number of 

search key values in the file.

•  An ideal hash function is uniform, i.e., each bucket is assigned

the same number of search key values from the set of all

possible values.

• Ideal hash function is random, so each bucket will have the

same number of records assigned to it irrespective of the actual

distribution of search key values in the file.

Page 40: FILE ORGANISATION.pptx

7/27/2019 FILE ORGANISATION.pptx

http://slidepdf.com/reader/full/file-organisationpptx 40/41

DEFICIENCIES OF STATIC HASHING

• In static hashing, function h maps search key values to a fixed

set of B of bucket addresses. Databases grow or shrink with

time. If initial number of buckets is too small, and file grows,

performance will degrade.

• If space is allocated for anticipated growth, a significant amount

of space will be wasted initially.

• If database shrinks, again space will be wasted.

Page 41: FILE ORGANISATION.pptx

7/27/2019 FILE ORGANISATION.pptx

http://slidepdf.com/reader/full/file-organisationpptx 41/41

DYNAMIC HASHING

•  Allows the hash function to be modified dynamically.

• Extendable hashing is one form of dynamic hashing. It splits

and coalesces buckets as database size changes.

•We choose a hash function that generates values over arelatively large range.

• Buckets are created on demand, and all bits of the hash are not

used initially, the no. of bits used changes with change in

database size.