Summarization – CS 257 Chapter – 13 Database Systems: The Complete Book Submitted by: Nitin...

Summarization – CS 257

Chapter – 13 Database Systems: The

Complete BookSubmitted by:Nitin Mathur

Submitted to: Dr.T.Y.Lin

What is an Index?

Blocks Holding Records

Index Matching Records

Value

An Index is any Data Structure that takes as input a property of Records - typically the value of one or more fields - and finds the records with that property “quickly”.

Indexes on Sequential Files10

40

20

30

50

60

70

80

This is a Sequential File in which tuples are sorted by their Primary Key.

Definitions to Study Index Structures

•Data File : A Sorted Sequential file

•Index File : Consisting of Key-Pointer pairs

•Search Key : A Search Key K in the Index file is associated with a pointer to a Data file record that has Search Key K.

Dense Indexes

40

30

50

60

70

80

20

10 10

40

20

30

50

60

70

80

There is an entry in the Index File for record in the Data File.

Sparse Indexes

70

50

90

110

130

150

30

10 10

40

20

30

50

60

70

80

Holds only one Key-Pointer per Data Block. The Key is for the first Record on

the Data Block.

Multiple Levels of Indexes

250

170

330

410

490

570

90

10 10

40

20

30

50

60

70

80

70

50

90

110

130

150

30

10

Adding second level of Sparse Index

Managing Indexes during Data Modification

• Create / Delete Overflow blocks

• Insert new blocks in the Sequential order

• Slide tuples to adjacent blocks.

Action Dense Index

Sparse Index

Create empty overflow block None None

Delete empty overflow block None None

Create empty Sequential block

None Insert

Delete empty Sequential block

None Delete

Insert Record Insert Update (?)

Delete Record Delete Update (?)

Slide Record Update Update (?)

How actions on the Sequential File

affect the Index File

Secondary Indexes

• Why do we need Secondary Indexes?

• SELECT name, address

• FROM moviestar

• WHERE birthday = DATE ’01/09/2008’;

Need Secondary Indexes on birthday to help with such queries

• Due advent in WWW keeping documents online and document retrieval become one of the largest database problem

• The most easy approach for document retrieval is to create separate index for each word (Problem: wastage of storage space)

• The other approach is to use Inverted Index

Document Retrieval and Inverted Index

Records is a collection of documents

•The inverted index itself consist of set of word-pointer pairs.•The inverted index pointers refer to position in the bucket file.•Pointers in the bucket file can be:

Pointers to documentPointers to occurrence of word (may be pair of first block

of document and an integer indicating number of word)When points to word we can include some info in bucket

array EX. For document using HTML and XML we can also

include marking associated with words so we can

distinguish between titles headers etc.

Inverted Index

Stemming: Remove suffixes to find stem of each word ( Ex. Plurals can be treated as there singular version.

Stop Words: words such as “the” or “and” are excluded from inverted index.

Ex. With ref. to next fig. if we want to find the document about the dogs that compare them with cats. • Difficult to solve with out understanding of text• However we could get good hint if we search

document that

1. Mention dogs in the title, and

2. Mention cats in an anchor <href>

More Information Retrieval Techniques to improve

effectiveness

The most commonly used index structure in the commercial systems.

Advantages• B-trees automatically maintains the levels of index according to file size• B-trees mange the space on the blocks so no overflow blocks are needed

The structure of B-trees• The tree is balanced ( All the paths from root to leaf have the same length• Typically three layers: the root, an intermediate layer, and leaves.

B-Trees

Keys are distributed among the leaves in sorted order, from left to right

• At root there are at least two used pointer. (exception of tree with single record)• At leaf last pointer points to the next leaf block to the right.• At interior node all n+1 pointer can be used (at least n+1/2 are actually used)

Important rules about what Important rules about what can appear in the blocks of a can appear in the blocks of a

B-treeB-tree

B-tree allow lookup, insertion, deletion of record using few disk I/O’s

1. If the number of keys per block is reasonably large then rarely we need to split or merge the blocks. And even if we need this operation this are limited to the leaves and there parents.

2. The number of disk I/O to read the record are normally the levels of B-tree plus the one (for lookup) and two (for insert or delete).

Ex. Suppose 340 key pointer pairs could fit in one block, suppose avg. block has occupied between min. and max. i.e. the typical block has 255 pointers.

• With root 255 children and 255^2 = 65025 leaves• Suppose among this leaves we have 255^3 or about 16.6

million records• That is, files with up to 16.6 million records can be

accommodated by 3 levels of B-tree• Number of disk I/O can reduced further by keeping B-tree in

main memory.

EfficiencyEfficiency ofof B-treeB-tree

IndexIndexDisk Failures

Intermittent Failures

Checksums

Stable Storage

Error- Handling Capabilities of Stable Storage

TypesTypes of Errorsof Errors

Intermittent Error: Read or write is unsuccessful.

Media Decay: Bit or bits becomes permanently corrupted.

Write Failure: Neither write or retrieve the data.

Disk Crash: Entire disk becomes unreadable.

IntermittentIntermittent Failures Failures

If we try to read the sector but the correct content of that sector is not delivered to the disk controller

Check for the good or bad sector To check write is correct: Read is performed Good sector and bad sector is known by the

read operation

ChecksumsChecksums Each sector has some additional bits, called the

checksums Checksums are set on the depending on the values

of the data bits stored in that sector Probability of reading bad sector is less if we use

checksums For Odd parity: Odd number of 1’s, add a parity bit 1 For Even parity: Even number of 1’s, add a parity bit

0 So, number of 1’s becomes always even

Example: 1. Sequence : 01101000-> odd no of 1’s

parity bit: 1 -> 011010001 2. Sequence : 111011100->even no of

1’sparity bit: 0 -> 111011100

By finding one bit error in reading and writing the bits and their parity bit results in sequence of bits that has odd parity, so the error can be detected

Error detecting can be improved by keeping one bit for each byte

Probability is 50% that any one parity bit will detect an error, and chance that none of the eight do so is only one in 2^8 or 1/256

Same way if n independent bits are used then the probability is only 1/(2^n) of missing error

Stable StorageStable Storage To recover the disk failure known as Media Decay,

in which if we overwrite a file, the new data is not read correctly

Sectors are paired and each pair is said to be X, having left and right copies as Xl and Xr respectively and check the parity bit of left and right by substituting spare sector of Xl and Xr until the good value is returned

Error Handling Capabilities of Error Handling Capabilities of Stable StorageStable Storage

Failures: If out of Xl and Xr, one fails, it can be read form other, but in case both fails X is not readable, and its probability is very small

Write Failure: During power outage, 1. While writing Xl, the Xr, will remain good and X can be read from Xr2. After writing Xl, we can read X from Xl, as Xr may or may not have the correct copy of X

Recovery from Disk Crashes: Recovery from Disk Crashes: Ways to recover dataWays to recover data

The most serious mode of failure for disks is “head crash” where data permanently destroyed.

So to reduce the risk of data loss by disk crashes there are number of schemes which are know as RAID (Redundant Arrays of Independent Disks) schemes.

Recovery from Disk CrashesRecovery from Disk CrashesWays to recover dataWays to recover data

Each of the schemes starts with one or more

disks that hold the data and adding one or

more disks that hold information that is

completely determined by the contents of

the data disks called Redundant Disk.

Mirroring as a Redundancy Mirroring as a Redundancy TechniqueTechnique

Mirroring Scheme is referred as RAID level 1 protection against data loss scheme.

In this scheme we mirror each disk.

One of the disk is called as data disk and other redundant disk.

In this case the only way data can be lost is if there is a second disk crash while the first crash is being repaired.

Parity Blocks

RAID level 4 scheme uses only one redundant disk no matter how many data disks there are.

In the redundant disk, the ith block consists of the parity checks for the ith blocks of all the data disks.

It means, the jth bits of all the ith blocks of both data disks and redundant disks, must have an even number of 1’s and redundant disk bit is used to make this condition true.

Parity Blocks – Reading disk

Reading data disk is same as reading block from any disk.

We could read block from each of the other disks and compute the block of the disk we want to read by taking the modulo-2 sum.

disk 2: 10101010disk 3: 00111000disk 4: 01100010

If we take the modulo-2 sum of the bits in each column, we get

disk 1: 11110000

Parity Block - Writing

When we write a new block of a data disk, we need to change that block of the redundant disk as well.

One approach to do this is to read all the disks and compute the module-2 sum and write to the redundant disk.

But this approach requires n-1 reads of data, write a data block and write of redundant disk block.

Total = n+1 disk I/Os

Parity Block - Writing

Better approach will require only four disk I/Os

1. Read the old value of the data block being changed.

2. Read the corresponding block of the redundant disk.

3. Write the new data block.

4. Recalculate and write the block of the redundant disk.

Parity Blocks – Failure Recovery

If any of the data disk crashes then we just have to compute the module-2 sum to recover the disk.

Suppose that disk 2 fails. We need to re compute each block of the replacement disk. We are given the corresponding blocks of the first and third data disks and the redundant disk, so the situation looks like:

disk 1: 11110000disk 2: ????????disk 3: 00111000disk 4: 01100010If we take the modulo-2 sum of each column, we deduce

that the missing block of disk 2 is : 10101010

An Improvement: RAID 5

RAID 4 is effective in preserving data unless there are two simultaneous disk crashes.

Whatever scheme we use for updating the disks, we need to read and write the redundant disk's block. If there are n data disks, then the number of disk writes to the redundant disk will be n times the average number of writes to any one data disk.

However we do not have to treat one disk as the redundant disk and the others as data disks. Rather, we could treat each disk as the redundant disk for some of the blocks. This improvement is often called RAID level 5.

Continued…

For instance, if there are n + 1 disks numbered 0 through n, we could treat the ith cylinder of disk j as redundant if j is the remainder when i is divided by n+1.

For example, n = 3 so there are 4 disks. The first disk, numbered 0, is redundant for its cylinders numbered 4,8, 12, and so on, because these are the numbers that leave remainder 0 when divided by 4.

The disk numbered 1 is redundant for blocks numbered 1, 5, 9, and so on; disk 2 is redundant for blocks 2, 6. 10,. . ., and disk 3 is redundant for 3, 7, 11,. . . .

Coping With Multiple Disk Crashes

• Error-correcting codes theory known as Hamming code leads to the RAID level 6.

• By this strategy the two simultaneous crashes are correctable.

The bits of disk 5 are the modulo-2 sum of the corresponding bits of disks 1, 2, and 3.

The bits of disk 6 are the modulo-2 sum of the corresponding bits of disks 1, 2, and 4.

The bits of disk 7 are the module2 sum of the corresponding bits of disks 1, 3, and 4

Coping With Multiple Disk Crashes – Reading/Writing

We may read data from any data disk normally.

To write a block of some data disk, we compute the modulo-2 sum of the new and old versions of that block. These bits are then added, in a modulo-2 sum, to the corresponding blocks of all those redundant disks that have 1 in a row in which the written disk also has 1.

Records consist of fields. Each record must have a schema

which is stored by database system. The schema includes the name and

data types of the fields and their offsets within the record.

RECORDS

Example:

CREATE TABLE Moviestar (name CHAR(30) PRIMARY KEY,address VARCHAR(255) ,gender CHAR(1) ,birthdate DATE );

Fixed Length Records

Name Address Gender Birth date

30 286 287 297


Each record start at a byte within its block that is a multiple of 4.

All fields within the record start at a byte that is offset from the beginning of the record by a multiple of 4.


So the record should look like this.

Name Address Gender Birth date

32 288 292 304


Following information should be there in the record.1. The record schema2. The length of the record3. Timestamps

Many record layouts include a header of some small number of bytes to provide this additional information.

Record Headers

To schema Timestamp

Name Address gender birthdate

0 12 44 300 304 316 length

Record Headers

Records representing tuples of a relation are stored in blocks of the disk and moved into main memory when we need to access or update them.

Header record1 record2 … record n

Records into Blocks

Header contains following information. Links to one or more other blocks that are part of a

network of blocks for creating indexes to the tuples of a relation.

Information about the role played by this block in such a network.

Information about which relation the tuples of this block belong to.

Timestamps indicating the time of the block's last modification or access.

Records into Blocks

Database consists of a server process that provides data from secondary storage to one or more client processes that are applications using the data.

The server and client processes may be on one machine, or the server and the various clients can be distributed over many machines.

Client - Server Systems

The client application uses a "virtual" address space.

The operating system or DBMS decides which parts of the address space are currently located in main memory, and hardware maps the virtual address space to physical locations in main memory.


The server's data lives in a database address space.

The addresses of this space refer to blocks, and possibly to offsets within the block.


These are byte strings that let us determine the place within the secondary storage system where the block or record can be found.

Bytes of physical address used to indicate following information:

The host to which the storage is attached.

An identifier for the disk or other device on which the block is located.

Client - Server Systems – Physical Address

The number of the cylinder of the disk.

The number of the track within the

cylinder.

The number of the block within the

track.

The offset of the beginning of the record

within the block.

Client - Server Systems – Physical Address

Each block or record has a "logical address," which is an arbitrary string of bytes of some fixed length.

A map table, stored on disk in a known location, relates logical to physical addresses.

Client - Server Systems Logical Address

Map table : logical physical

Logical address

Physical address

Client - Server Systems Logical Address

All the information needed for a physical address is found in the map table.

Many combinations of logical and physical addresses yield structured address schemes.

A very useful, combination of physical and logical addresses is to keep in each block an offset table that holds the offsets of the records within the block, as suggested in Fig .

Logical and Structured Addresses

Record4 Record3 Record2 Record1

Header

Unused

Offset value

A block with a table of offsets telling us the

position of each record within the block

Record Headers

The address of a record is now the physical address of its block plus the offset of the entry in the block's offset table for that record

ADVANTAGES Move the record around within the block We can even allow the record to move to

another block Finally, we have an option, should the record be

deleted, of leaving in its offset-table entry a tombstone, a special value that indicates the record has been deleted.

relational systems need the ability to represent pointers in tuples

index structures are composed of blocks that usually have pointers within them

Thus, we need to study the management of pointers as blocks are moved between main and secondary memory.

Pointer Swizzling

every block, record, object, or other referenceable data item has two forms of address:

database address the memory address of the item. in secondary storage, we must use the database

address of the item. However, when the item is in the main memory,

we can refer to the item by either its database address or its memory address.

We need a table that translates from all

those database addresses that are

currently in virtual memory to their

current memory address. Such a

translation table is suggested in Fig.

DB-addr mem-addrDatabase address

memory address

The translation table turns database addresses into their equivalents in memory

To avoid the cost of translating repeatedly from database addresses to memory addresses, several techniques have been developed that are collectively known as pointer swizzling.

when we move a block from secondary to main memory, pointers within the block may be “swizzled,"that is, translated from the database address space to the virtual address space.

A pointer actually consists of:

1. A bit indicating whether the pointer is

currently a database address or a (swizzled) memory address.

2.The database or memory pointer, as appropriate.

As soon as a block is brought into memory, we locate dl its pointers and addresses and enter them into the translation table if they are not already there.

However we need some mechanism to locate the pointers.

Automatic Swizzling

For example:

1. If the block holds records with a known schema, the schema will tell us where in the records the pointers are found.

2. If the block is used for one of the index structures then the block will hold pointers at known locations.

3. We may keep within the block header a list of where the pointers are.

Disk Memory

Block1

Block2

Swizzled

Unswizzled

Read into Memory

Structure of a pointer when swizzling is used

leave all pointers unswizzled when the block is first brought into memory.

We enter its address, and the addresses of its pointers, into the translation table, along with their memory equivalents.

If and when we follow a pointer P that is inside some block of memory, we swizzle it.

difference between on-demand and automatic swizzling is that the latter tries to get all the pointers swizzled quickly and efficiently when the block is loaded into memory.

Swizzling on Demand

The possible time saved by swizzling all of a block‘s pointers at one time must be weighed against the possibility that some swizzled pointers will never be followed.

In that case, any time spent swizzling and unswizzling the pointer will be wasted.

Drawback

arrange that database pointers look like invalid memory addresses. If so, then we can allow the

computer to follow any pointer as if it were in its memory form.

If the pointer happens to be unswizzled, then the memory reference will cause a hardware

trap. If the DBMS provides a function that is invoked by

the trap, and this function "swizzles" the pointer and then we can follow swizzled pointers in single instructions, and only need to do something more time consuming when the pointer is unswizzled.

Option

it is possible never to swizzle pointers.

We still need the translation table, so the pointers may be followed in their unswizzled form.

No Swizzling

it may be known by the application programmer whether the pointers in a block are likely to be followed.

This programmer may be able to specify explicitly that a block loaded into memory is to have its pointers swizzled, or the programmer may call for the pointers to be swizzled only as needed.

Programmer Control of Swizzling

When a block is moved from memory back to disk, any pointers within that block must be "unswizzled“.

The translation table can be used to associate addresses of the two types in either direction

However, we do not want each unswizzling operation to require a search of the entire translation table.

Returning Blocks to Disk

If we think of the translation table as a relation, then the problem of findingm the memory address associated with a database address x can be expressed as the query:

SELECT memAddr FROM TranslationTable

WHERE dbAddr = x;

If we want to support the reverse query, then we need to have an index on attribute memAddr as well.

SELECT dbAddr FROM TranslationTable WHERE memAddr = y;

A block in memory is said to be pinned if it cannot at the moment be written back to disk safely.

A bit telling whether or not a block is pinned can be located in the header of the block.

Pinned Records and Blocks

If a block B1 has within it a swizzled pointer to some data item in block B2.

we follow the pointer in B1,it will lead us to the buffer, which no longer holds B2; in effect, the pointer has become dangling.

A block, like B2, that is referred to by a swizzled pointer from somewhere else is therefore pinned

Reason for block to be pinned

If it is pinned, we must either unpin it, or let the block remain in memory, occupying space that could otherwise be used for some other block.

To unpin a block that is pinned because of swizzled pointers from outside, we must "unswizzle” any pointers to it.

Consequently, the translation table must record, for each database address whose data item is in memory, the places in memory where swizzled pointers to that item exist.

Two possible approaches are:

1. Keep the list of references to a memory address as a linked list attached to the entry for that address in the translation table.

2. If memory addresses are significantly shorter than database addresses, we can create the linked list in the space used for the pointers themselves.

That is, each space used for a database pointer is replaced by

(a) The swizzled pointer, and (b) Another pointer that forms part of a linked

list of all occurrences of this pointer.

x y

y

y

A linked list of occurrences of a swizzled pointer

Swizzled pointer

Records With Variable-Length Fields

A simple but effective scheme is to put all fixed length fields ahead of the variable-length fields. We then place in the record header:

1. The length of the record.2. Pointers to (i.e., offsets of) the beginnings of all

the variable-length fields. However, if the variable-length fields always appear in the same order then the first of them needs no pointer; we know it immediately follows the fixed-length fields.

Records With Repeating Fields

A similar situation occurs if a record contains a variablenumber of Occurrences of a field F, but the field itself is offixed length. It is sufficient to group all occurrences of field Ftogether and put in the record header a pointer to the first. We can locate all the occurrences of the field F as follows.Let the number of bytes devoted to one instance of field F beL. We then add to the offset for the field F all integermultiples of L, starting at 0, then L, 2L, 3L, and so on. Eventually, we reach the offset of the field following F.Where upon we stop.

An alternative representation is to keep the record of fixed length, and put the variable length portion - be it fields of variable length or fields that repeat an indefinite number of times - on a separate block. In the record itself we keep:

1. Pointers to the place where each repeating field begins, and

2. Either how many repetitions there are, or where the repetitions end.

Storing variable-length fields separately from the record

Variable-Format Records

The simplest representation of variable-format records is a sequence of tagged fields, each of which consists of:

1. Information about the role of this field, such as:(a) The attribute or field name,(b) The type of the field, if it is not apparent from the field name and some readily available schema information, and(c) The length of the field, if it is not apparent from the type.

2. The value of the field.

There are at least two reasons why tagged fields would make sense.

1. Information integration applications - Sometimes, a relation has been constructed from several earlier sources, and these sources have different kinds of information For instance, our movie star information may have come from several sources, one of which records birthdates, some give addresses, others not, and so on. If there are not too many fields, we are probably best off leaving NULL those values we do not know.

2. Records with a very flexible schema - If many fields of a record can repeat and/or not appear at all, then even if we know the schema, tagged fields may be useful. For instance, medical records may contain information about many tests, but there are thousands of possible tests, and each patient has results for relatively few of them.

A record with tagged fields

Records That Do Not Fit in a Block

These large values have a variable length, but even if the length is fixed for all values of the type, we need to use some special techniques to represent these values. In this section we shall consider a technique called “spanned records" that can be used to manage records that are larger than blocks.

Spanned records also are useful in situations where records are smaller than blocks, but packing whole records into blocks wastes significant amounts of space.For both these reasons, it is sometimes desirable to allow records to be split across two or more blocks. The portion of a record that appears in one block is called a record fragment.

If records can be spanned, then every record and record fragment requires some extra header information:

1. Each record or fragment header must contain a bit telling whether or not it is a fragment.

2. If it is a fragment, then it needs bits telling whether it is the first or last fragment for its record.

3. If there is a next and/or previous fragment for the same record, then the fragment needs pointers to these other fragments.

Storing spanned records across blocks

BLOBS

Binary, Large OBjectS = BLOBS BLOBS can be images, movies, audio files and other

very large values that can be stored in files. Storing BLOBS

Stored in several blocks.– Preferable to store them consecutively on a

cylinder or multiple disks for efficient retrieval. Retrieving BLOBS

– A client retrieving a 2 hour movie may not want it all at the same time.

– Retrieving a specific part of the large data requires an index structure to make it efficient. (Example: An index by seconds on a movie BLOB.)

Column Stores

An alternative to storing tuples as records is to store each column as a record. Since an entire column of a relation may occupy far more than a single block, these records may span many block, much as long as files do. If we keep the values in each column in the same order then we can reconstruct the relation from column records

Consider this relation

INTRODUCTION

What is Record ?Record is a single, implicitly structured data item in the database table. Record is also called as Tuple.

What is definition of Record Modification ?

We say Records Modified when a data manipulation operation is performed.

STRUCTURE OF A RECORD

RECORD STRUCTURE FOR A PERSON TABLE

CREATE TABLE PERSON ( NAME CHAR(30), ADDRESS CHAR(256) , GENDER CHAR(1), BIRTHDATE CHAR(10));

TYPES OF RECORDS

FIXED LENGTH RECORDS

CREATE TABLE SJSUSTUDENT(STUDENT_ID

INT(9) NOT NULL , PHONE_NO INT(10) NOT NULL);

VARIABLE LENGTH RECORDS

CREATE TABLE SJSUSTUDENT(STUDENT_ID INT(9) NOT NULL,NAME CHAR(100) ,ADDRESS CHAR(100) ,PHONE_NO INT(10) NOT NULL);

RECORD MODIFICATION

Modification of Record Insert Update Delete Issues even with Fixed Length

Records More Issues with Variable Length

Records

STRUCTURE OF A BLOCK & RECORDS

Various Records are clubbed together and stored together in memory in blocks

STRUCTURE OF BLOCK

BLOCKS & RECORDS

If records need not be any particular order, then just find a block with enough empty space.

We keep track of all records/tuples in a relation/tables using Index structures, File organization concepts

Inserting New Records

If Records are not required to be a particular order, just find an empty block and place the record in the block.

eg: Heap Files What if the Records are to be Kept in a

particular Order(eg: sorted by primary key) ? Locate appropriate block,check if space is

available in the block if yes place the record in the block.

INSERTING NEW RECORDS

We may have to slide the Records in the Block to place the Record at an appropriate place in the Block and suitably edit the block header.

What If The Block Is Full ?

We need to Keep the record in a particular block but the block is full. How do we deal with it ?

We find room outside the Block There are 2 approaches to finding the

room for the record.1. Find Space on Nearby Block2. Create an Overflow Block

Approaches to finding room for record

Find space on nearby block Block b1 has no space If space available on block b2 move

records of b1 to b2. If there are external pointers to

records of b1 moved to b2 leave forwarding address in offset table of b1

Approaches to finding room for record

Create overflow blockEach block b has in its header pointer to an overflow block where additional blocks of b can be placed.

Deletion

Try to reclaim the space available on a record after deletion of a particular record

If an offset table is used for storing information about records for the block then rearrange/slide the remaining records.

If Sliding of records is not possible then maintain a SPACE-AVAILABLE LIST to keep track of space available on the Record.

Tombstone

What about pointer to deleted records ? A tombstone is placed in place of each deleted

record A tombstone is a bit placed at first byte of

deleted record to indicate the record was deleted ( 0 – Not Deleted 1 – Deleted)

A tombstone is permanent

Updating Records

For Fixed-Length Records, there is no effect on the storage system

For variable length records : If length increases, like insertion

“slide the records” If length decreases, like deletion we

update the space-available list, recover the space/eliminate the overflow blocks.

Secondary Storage Management

Database systems always involve secondary storage like the disks and other devices that store large amount of data that persists over time.

The Memory Hierarchy

A typical computer system has several different components in which data may be stored.

These components have data capacities ranging over at least seven orders of magnitude and also have access speeds ranging over seven or more orders of magnitude.

The Memory hierarchy from the text book as follows:

Cache

It is the lowest level of the hierarchy is a cache. Cache is found on the same chip as the microprocessor itself, and additional level-2 cache is found on another chip.

Data and instructions are moved to cache from main memory when they are needed by the processor.

Cache data can be accessed by the processor in a few nanoseconds.

Main Memory

In the center of the action is the computer's main memory. We may think of everything that happens in the computer - instruction executions and data manipulations - as working on information that is resident in main memory

Typical times to access data from main memory to the processor or cache are in the 10-100 nanosecond range

Secondary Storage

Essentially every computer has some sort of secondary storage, which is a form of storage that is both significantly slower and significantly more capacious than main memory.

The time to transfer a single byte between disk and main memory is around 10 milliseconds.

Tertiary Storage

As capacious as a collection of disk units can be, there are databases much larger than what can be stored on the disk(s) of a single machine, or even of a substantial collection of machines.

Tertiary storage devices have been developed to hold data volumes measured in terabytes.

Tertiary storage is characterized by significantly

higher read/write times than secondary storage,

but also by much larger capacities and smaller

cost per byte than is available from magnetic

disks.

Transfer of Data Between Levels

Normally, data moves between adjacent levels of the hierarchy.

At the secondary and tertiary levels, accessing the desired data or finding the desired place to store data takes a great deal of time, so each level is organized to transfer large amount of data or from the level below, whenever any data at all is needed.

The disk is organized into disk blocks and the entire blocks are moved to or from a continuous section of main memory called a buffer.

Volatile and Nonvolatile Storage

A volatile device "forgets" what is stored in it when the power goes off.

A nonvolatile device, on the other hand, is expected to keep its contents intact even for long periods when the device is turned off or there is a power failure.

Magnetic and optical materials hold their data in the absence of power.

Thus, essentially all secondary and tertiary storage devices are nonvolatile. On the other hand main memory is generally volatile.

Virtual Memory

When we write programs the data we use, variables of the program, files read and so on occupies a virtual memory address space.

Many machines use a 32-bit address space; that is, there are 2(pow)32 bytes or 4 gigabytes.

The Operating System manages virtual memory, keeping some of it in main memory and the rest on disk.Transfer between memory and disk is in units of disk blocks.

Disks

The use of secondary storage is one of

the important characteristics of a DBMS,

and secondary storage is almost

exclusively based on magnetic disks

Mechanics of Disks

The two principal moving pieces of a disk drive are a disk assembly and a head assembly.

The disk assembly consists of one or more circular platters that rotate around a central spindle

The upper and lower surfaces of the platters are covered with a thin layer of magnetic material, on which bits are stored.

A typical disk format from the text book is shown as below:

0’s and 1’s are represented by different patterns in the magnetic material.A common diameter for the disk platters is 3.5 inches. The disk is organized into tracks, which are concentric circles on a single platter.The tracks that are at a fixed radius from a center, among all the surfaces form one cylinder.

Top View of a disk surface from the text is as shown below

Tracks are organized into sectors, which are segments of the circle separated by gaps that are magnetized to represent either 0’s or 1’s. The second movable piece the head assembly, holds the disk heads.

The Disk Controller

One or more disk drives are controlled by a disk controller, which is a small processor capable of:

Controlling the mechanical actuator that moves the head assembly to position the heads at a particular radius.

Transferring bits between the desired sector and the main memory.

Selecting a surface from which to read or write, and selecting a sector from the track on that surface that is under the head.An example of single processor is shown in next slide.

Simple computer system from the text is shown below

Disk Access Characteristics:

Seek Time: The disk controller positions the head assembly at the cylinder containing the track on which the block is located. The time to do so is the seek time.

Rotational Latency: The disk controller waits while the first sector of the block moves under the head. This time is called the rotational latency.

Transfer Time: All the sectors and the gaps between them pass under the head, while the disk controller reads or writes data in these sectors. This delay is called the transfer time.The sum of the seek time, rotational latency, transfer time is the latency of the time.

B-Trees

B-tree organizes its blocks into a tree. The tree is balanced, meaning that all paths from the root to a leaf have the same length. Typically, there are three layers in a B-tree: the root, an intermediate layer, and leaves, but any number of layers is possible.

Functionalities of B- Tree

• B-Trees automatically maintain as many levels of index as is appropriate for the size of the file being indexed.

• B-Trees manage the space on the blocks they use so that every block is between half used and completely full. No overflow blocks are needed.

Structure of B-Trees

There are three layers in binary trees- the root, an intermediate layer and leaves

In a B-Tree each block have space for n search-key values and n+1 pointers

[next slide explains the structure of a B-Tree]

Root

B-Tree Example n=3

100

120

150

180

30

3 5 11 30 35 100

101

110

120

130

150

156

179

180

200

Sample non-leaf

to keys to keys to keys to keys to keys

< 57 57 k<81 81k<95 95

57 81 95

From non-leaf node

to next leafin sequence57 81 95

To re

cord

w

ith k

ey 5

7

To re

cord

w

ith k

ey 8

1

To re

cord

w

ith k

ey 8

5

Sample leaf node

In textbook’s notationn=3

Leaf:

Non-leaf:

30 3530

30 35

30

Size of nodes: n+1 pointersn keys (fixed)

Don’t want nodes to be too empty

Use at least

Non-leaf: (n+1)/2 pointers

Leaf: (n+1)/2 pointers to data

Full node min. node

Non-leaf

Leaf

n=3

120

150

180

30

3 5 11 30 35

coun

ts e

ven

if nu

ll

B-tree rules tree of order n

(1) All leaves at same lowest level(balanced tree)

(2) Pointers in leaves point to recordsexcept for “sequence pointer”

Non-leaf(non-root) n+1 n (n+1)/2 (n+1)/2- 1

Leaf(non-root) n+1 n

Root n+1 n 1 1

Max Max Min Min ptrs keys ptrsdata keys

(n+1)/2 (n+1)/2

Applications of B-trees

The search key of the B-tree is the primary key for the data file, and the index is dense. That is, there is one key-pointer pair in a leaf for every record of the data file. The data file may or may not be sorted by primary key.

2. The data file is sorted by its primary key, and the B-tree is a sparse index with one key-pointer pair at a leaf for each block of the data file.

3. The data file is sorted by an attribute that is not a key, and this attribute is the search key for the B-tree. For each key value K that appears in the data file there is one key-pointer pair at a leaf. That pointer goes to the first of the records that have K as their sort-key value.

Lookup in B-Trees

Suppose we want to find a record with search key 40. We will start at the root , the root is 13, so the record will

go the right of the tree. Then keep searching with the same concept.

Looking for block “40”<not present>

13

317

312923191713117532

43

4137 4743

23

Range Queries

B-trees are used for queries in which a range of values are asked for. Like,

SELECT * FROM R WHERE R. k >= 10 AND R. k <= 25;

Insert into B-tree

(a) simple case– space available in leaf

(b) leaf overflow(c) non-leaf overflow(d) new root

(a) Insert key = 32 n=33 5 11 30 31

30

100

32

(a) Insert key = 7 n=3

3 5 11 30 31

30

100

3 5

7

7

(c) Insert key = 160 n=310

0

120

150

180

150

156

179

180

200

160

180

160

179

(d) New root, insert 45 n=3

10 20 30

1 2 3 10 12 20 25 30 32 40 40 45

40

30new root

CS 245 Notes 4 148

(a) Simple case - no example

(b) Coalesce with neighbor (sibling)

(c) Re-distribute keys

(d) Cases (b) or (c) at non-leaf

Deletion from B-tree

(b) Coalesce with sibling Delete 50

10 40 100

10 20 30 40 50

n=4

40

(c) Redistribute keys Delete 50

10 40 100

10 20 30 35 40 50

n=4

35

35

40 4530 3725 2620 2210 141 3

10 20 30 40

(d) Non-leaf coalese– Delete 37

n=4

40

30

25

25

new root

B-tree deletions in practice

– Often, coalescing is not implemented– Too hard and not worth it!

Why we take 3 as the number of levels of a B-tree?

Suppose our blocks are 4096 bytes. Also let keys be integers of 4 bytes and let pointers be 8 bytes. If there is no header information kept on the blocks, then we want to find the largest integer value of n such that -

411 + 8(n + 1) 5 4096. That value is n = 340. 340 key-pointer pairs could fit in one block for our example data. Suppose that the average block has an occupancy midway between the minimum and maximum. i.e.. a typical block has 255 pointers. With a root 255 children and 255*255= 65023 leaves. We shall have among those leaves cube of 253. or about 16.6 million pointers to records. That is, files with up to 16.6 million records can be accommodated by a 3-level B-tree.

Summarization – CS 257 Chapter – 13 Database Systems: The Complete Book Submitted by: Nitin...

Documents

Transcript of Summarization – CS 257 Chapter – 13 Database Systems: The Complete Book Submitted by: Nitin...