Relational Database Designrbvrrwomenscollege.net/wp-content/uploads/2018/05/RDBMS-Unit-II.pdf · A...

Relational Database Design

Unit-2

(Part-A)

A relational DBMS must use its relational facilities exclusively to manage and interact with the database.

The rules:

These rules were defined by Codd in a paper published in 1985. They specify what a relational database must support in order to be relational. These rules have been considerably extended .

1. Information rule

Data are represented only one way: as values within columns within rows.

Simple, consistent and versatile.

The basic requirement of the relational model.

2. Guaranteed access rule

Every value can be accessed by providing table name, column name and key.

All data are uniquely identified and accessible via this identity.

3. Systematic treatment of null values

Separate handling of missing and/or non applicable data.

This is distinct to zero or empty strings

Codd would further like several types of null to be handled.

4. Relational online catalog

Catalog (data dictionary) can be queried by authorized users as part of the database.

The catalog is part of the database.

5. Comprehensive data sublanguage

Used interactively and embedded within programs

Supports data definition, data manipulation, security, integrity constraints and transaction processing

Today means: must support SQL.

6. View updating rule

All theoretically possible view updates should be possible.

Views are virtual tables. They appear to behave as conventional tables except that they are built dynamically when the query is run.

This means that a view is always up to date. It is not always theoretically possible to update views. Codd himself, did not completely understand this.

One problem exists when a view relates to part of a table not including a candidate key. This means that potential updates would violate the entity integrity rule.

7. High-level insert, update and

delete

Must support set-at-a-time updates.

ie. Transactions

eg: UPDATE mytable SET mycol = valueWHERE condition; Many rows may be updated with this single statement.

8. Physical data independence

Physical layer of the architecture is mapped onto the logical layer.

Users and programs are not dependent on the physical structure of the database.

(Physical layer implementation is dependent on the DBMS.)

9. Logical data independence

Users and programs are independent of the logical structure of the database.

i.e.: the logical structure of the data can evolve with minimal impact on the programs.

10. Integrity independence

Integrity constraints are to be stored in the catalog not the programs.

Alterations to integrity constraints should not affect application programs.

This simplifies the programs.

It is not always possible to do this.

11. Distribution independence

Applications should still work in a distributed database (DDB).

12. Nonsubversion rule

If there is a record-at-a-time interface (eg via 3GL), security and integrity of the database must not be violated.

There should be no backdoor to bypass the security imposed by the DBMS.

Rule Zero for RDBMS:Many new DBMS claim to be relational plus supporting extended features. eg. PostgreSQL is a RDBMS with extended Object Oriented features. Codd's rule zero specifies a criteria for RDBMS:

"For any system that is advertised as, or claimed to be, a relational database management system, that system must be able to manage databases entirely through its relational capabilities, no matter what additional capabilities the system may support." (Codd, 1990)

In Codd 1990, Codd extended the 12 rules to 18 to include rules on catalog, data types (domains), authorisation etc.

Functional Dependencies

A functional dependency, denoted by X → Y, between two sets of attributes X and Y that are subsets of Rspecifies a constraint on the possible tuples that can form a relation state r of R . The constraint is for any two tuples t1 and t2 that have t1[X] = t2[X], we must also have t1[Y] = t2[Y].


Inference rules for functional dependencies:

1. If X Y(super subset) then X → Y

2. X → Y XZ → YZ

3. { X → Y , Y → Z } X → Z

4. X → YZ X → Y

5. { X → Y , X → Z } X → YZ

6. { X → Y , WY → Z } WX → Z


Inference rules 1 – 3 are known as Armstrong‟s inference rules.

IR 1 – IR 3 are,

Sound

Complete


Closure of set of functional dependencies F denoted by F+ is the set

of all functional dependencies that can be inferred from F.

For a set of attributes X set of all attributes that depend on X under set of

functional dependencies F is called closure of X under F (denoted by X+).


A set of functional dependencies E is said to

covered by F if every FD in E is also in F+.

Two sets of FDs E and F are equivalent if E+

= F+.

If E and F are equivalent then

E covers F

F covers E


A set of FDs F is minimal if it satisfies following conditions.

1. Every dependency in F has a single attribute in its RHS.

2. We can not replace any dependency X → A in Fwith dependency Y → A, where Y is a proper subset of X and still have a set of dependencies that is equivalent to F.

3. We can not remove any dependency from F and still have a set of dependencies that is equivalent to F.


A minimal set of functional dependencies is a set of dependencies in standard form with no redundancies.

Minimal cover of set of functional dependencies is the minimal set of functional dependencies.


X -> A is an assertion about a relation Rthat whenever two tuples of R agree on all the attributes of X, then they must also agree on the attribute A.

Say “X -> A holds in R.”

Convention: …, X, Y, Z represent sets of attributes; A, B, C,… represent single attributes.

Convention: no set formers in sets of attributes, just ABC, rather than {A,B,C }.

Example

Drinkers(name, addr, beersLiked, manf, favBeer)

Reasonable FD‟s to assert:

1. name -> addr

2. name -> favBeer

3. beersLiked -> manf

Example Data

name addr beersLiked manf favBeerJaneway Voyager Bud A.B. WickedAle

Janeway Voyager WickedAle Pete’s WickedAle

Spock Enterprise Bud A.B. Bud

Because name -> addr Because name -> favBeer

Because beersLiked -> manf

FD‟s With Multiple Attributes

No need for FD‟s with > 1 attribute on right.

But sometimes convenient to combine FD‟s as a shorthand.

Example: name -> addr andname -> favBeer become name -> addr favBeer

> 1 attribute on left may be essential.

Example: bar beer -> price

Keys of Relations

K is a superkey for relation R if K functionally determines all of R.

K is a key for R if K is a superkey, but no proper subset of K is a superkey.

Example

Drinkers(name, addr, beersLiked, manf,favBeer)

{name, beersLiked} is a superkey because together these attributes determine all the other attributes.

name -> addr favBeer

beersLiked -> manf

Example, Contd.

{name, beersLiked} is a key because neither {name} nor {beersLiked} is a superkey.

name doesn‟t -> manf; beersLiked doesn‟t -> addr.

There are no other keys, but lots of superkeys.

Any superset of {name, beersLiked}.

E/R and Relational Keys

Keys in E/R concern entities.

Keys in relations concern tuples.

Usually, one tuple corresponds to one entity, so the ideas are the same.

But --- in poor relational designs, one entity can become several tuples, so E/R keys and Relational keys are different.

Example Data

name addr beersLiked manf favBeerJaneway Voyager Bud A.B. WickedAle

Janeway Voyager WickedAle Pete’s WickedAle

Spock Enterprise Bud A.B. Bud

Relational key = {name beersLiked}But in E/R, name is a key for Drinkers, and beersLiked is a key

for Beers.Note: 2 tuples for Janeway entity and 2 tuples for Bud entity.

More FD‟s From “Physics”

Example: “no two courses can meet in the same room at the same time” tells us: hour room -> course.

Inferring FD‟s

We are given FD‟s X1 -> A1, X2 -> A2,…, Xn -> An , and we want to know whether an FD Y -> B must hold in any relation that satisfies the given FD‟s.

Example: If A -> B and B -> C hold, surely A -> C holds, even if we don‟t say so.

Important for design of good relation schemas.

Inference Test

Use the given FD‟s to infer that these tuples must also agree in certain other attributes.

If B is one of these attributes, then Y -> Bis true.

Otherwise, the two tuples, with any forced equalities, form a two-tuple relation that proves Y -> B does not follow from the given FD‟s.

Closure Test

An easier way to test is to compute the closure of Y, denoted Y +.

Basis: Y + = Y.

Induction: Look for an FD‟s left side Xthat is a subset of the current Y +. If the FD is X -> A, add A to Y +.

Finding All Implied FD‟s

Motivation: “normalization,” the process where we break a relation schema into two or more schemas.

Example: ABCD with FD‟s AB ->C, C ->D, and D ->A.

Decompose into ABC, AD. What FD‟s hold in ABC ?

Not only AB ->C, but also C ->A !

Basic Idea

1. Start with given FD‟s and find all nontrivial FD‟s that follow from the given FD‟s.

Nontrivial = left and right sides disjoint.

2. Restrict to those FD‟s that involve only attributes of the projected schema.

Example

ABC with FD‟s A ->B and B ->C. Project onto AC.

A +=ABC ; yields A ->B, A ->C.

We do not need to compute AB + or AC +.

B +=BC ; yields B ->C.

C +=C ; yields nothing.

BC +=BC ; yields nothing.

Example --- Continued

Resulting FD‟s: A ->B, A ->C, and B->C.

Projection onto AC : A ->C.

Only FD that involves a subset of {A,C }.

A Geometric View of FD‟s

Imagine the set of all instances of a particular relation.

That is, all finite sets of tuples that have the proper number of components.

Each instance is a point in this space.

Example: R(A,B)

{(1,2), (3,4)}

{}

{(1,2), (3,4), (1,3)}

{(5,1)}

An FD is a Subset of Instances

For each FD X -> A there is a subset of all instances that satisfy the FD.

We can represent an FD by a region in the space.

Trivial FD = an FD that is represented by the entire space.

Example: A -> A.

Example: A -> B for R(A,B)

{(1,2), (3,4)}

{}

{(1,2), (3,4), (1,3)}

{(5,1)}A -> B

Representing Sets of FD‟s

If each FD is a set of relation instances, then a collection of FD‟s corresponds to the intersection of those sets.

Intersection = all instances that satisfy all of the FD‟s.

Example

A->B

B->C

CD->A

Instances satisfyingA->B, B->C, andCD->A

Implication of FD‟s

If an FD Y -> B follows from FD‟s X1 -> A1,…,Xn -> An , then the region in the space of instances for Y -> B must include the intersection of the regions for the FD‟s Xi -> Ai . That is, every instance satisfying all the

FD‟s Xi -> Ai surely satisfies Y -> B.

But an instance could satisfy Y -> B, yet not be in this intersection.

Example

A->B B->CA->C

Normalization

E.F. Codd defined well-structured “normal forms” of relations, “normalization”

Normalization

Reduce complex user views to a set of small, stable data structures

Eliminate errors and inconsistencies related to the adding, deleting or updating of record occurrences

What it‟s all about

Given a relation, R, and a set of functional dependencies, F, on R.

Assume that R is not in a desirable form for enforcing F.

Decompose relation R into relations, R1,..., Rk, with associated functional dependencies, F1,..., Fk, such that R1,..., Rk are in a more desirable form, 3NF or BCNF.

While decomposing R, make sure to preserve the dependencies, and make sure not to lose information.

ContentsThe Good and the Bad

Bad database design redundancy of fact

fact clutter

information loss

dependency loss

Good database design

How to compute with meaning functional dependencies - FDs

Armstrong‟s inference rules

the meaning of a set of FDs

minimal cover of a set of FDs

Normal Forms - overview

1NF, 2NF, 3NF, BCNF

The Good

Primitive DomainsFLT-SCHEDULE

flt# weekday airline dtime from atime to

DL242 MO WE FR DELTA 10:40 ATL 12:30 BOS

SK912 SA SU SAS 12:00 CPH 15:30 JFK

AA242 MO FR AA 08:00 CHI 10:10 ATL

Attributes must be defined over domains

with atomic values

FLT-SCHEDULE

flt# weekday airline dtime from atime to

DL242 MO DELTA 10:40 ATL 12:30 BOS

SK912 SA SAS 12:00 CPH 15:30 JFK

AA242 MO AA 08:00 CHI 10:10 ATL

DL242 WE DELTA 10:40 ATL 12:30 BOS

DL242 FR DELTA 10:40 ATL 12:30 BOS

SK912 SU SAS 12:00 CPH 15:30 JFK

AA242 FR AA 08:00 CHI 10:10 ATL

Bad Database Design- redundancy of fact

FLIGHTS

flt# date airline plane#

DL242 10/23/00 Delta k-yo-33297

DL242 10/24/00 Delta t-up-73356

DL242 10/25/00 Delta o-ge-98722

AA121 10/24/00 American p-rw-84663

AA121 10/25/00 American q-yg-98237

AA411 10/22/00 American h-fe-65748

redundancy: airline name repeated for same flight

inconsistency: when airline name for a flight changes, it must be changed many places

Bad Database Design- fact clutter

insertion anomalies: how do we represent that SK912 is flown by Scandinavian without there being a date and a plane assigned?

deletion anomalies: cancelling AA411 on 10/22/00 makes us lose that it is flown by American.

update anomalies: if DL242 is flown by Sabena, we must change it everywhere.

FLIGHTS


DL242 10/23/00 Delta k-yo-33297

DL242 10/24/00 Delta t-up-73356

DL242 10/25/00 Delta o-ge-98722

AA121 10/24/00 American p-rw-84663

AA121 10/25/00 American q-yg-98237

AA411 10/22/00 American h-fe-65748

Bad Database Design- information loss

FLIGHTS


DL242 10/23/14 Delta k-yo-33297

DL242 10/24/14 Delta t-up-73356

DL242 10/25/14 Delta o-ge-98722

AA121 10/24/14 American p-rw-84663

AA121 10/25/14 American q-yg-98237

AA411 10/22/14 American h-fe-65748

FLIGHTS-AIRLINE

flt# airline

DL242 Delta

AA121 American

AA411 American

DATE-AIRLINE-PLANE

date airline plane#

10/23/14 Delta k-yo-33297

10/24/14 Delta t-up-73356

10/25/14 Delta o-ge-98722

10/24/14 American p-rw-84663

10/25/14 American q-yg-98237

10/22/14 American h-fe-65748

Bad Database Design- information loss

FLIGHTS


DL242 10/23/14 Delta k-yo-33297

DL242 10/24/14 Delta t-up-73356

DL242 10/25/14 Delta o-ge-98722

AA121 10/24/14 American p-rw-84663

AA121 10/25/14 American q-yg-98237

AA211 10/22/14 American h-fe-65748

AA411 10/24/14 American p-rw-84663

AA411 10/25/14 American q-yg-98237

AA411 10/22/14 American h-fe-65748

DATE-AIRLINE-PLANE

date airline plane#

10/23/14 Delta k-yo-33297

10/24/14 Delta t-up-73356

10/25/14 Delta o-ge-98722

10/24/14 American p-rw-84663

10/25/14 American q-yg-98237

10/22/14 American h-fe-65748

FLIGHTS-AIRLINE

flt# airline

DL242 Delta

AA121 American

AA411 American

• information loss:

we polluted the

database with false

facts; we can’t find

the true facts.

Bad Database Design- dependency loss

DATE-AIRLINE-PLANE

date airline plane#

10/23/14 Delta k-yo-33297

10/24/14 Delta t-up-73356

10/25/14 Delta o-ge-98722

10/24/14 American p-rw-84663

10/25/14 American q-yg-98237

10/22/14 American h-fe-65748

FLIGHTS-AIRLINE

flt# airline

DL242 Delta

AA121 American

AA411 American

dependency loss: we lost the fact that (flt#, date) plane#

Good Database Design

no redundancy of FACT (!)

no inconsistency

no insertion, deletion or update anomalies

no information loss

no dependency loss

FLIGHTS-DATE-PLANE

flt# date plane#

DL242 10/23/14 k-yo-33297

DL242 10/24/14 t-up-73356

DL242 10/25/14 o-ge-98722

AA121 10/24/14 p-rw-84663

AA121 10/25/14 q-yg-98237

AA411 10/22/14 h-fe-65748

FLIGHTS-AIRLINE

flt# airline

DL242 Delta

AA121 American

AA411 American

Let X and Y be sets of attributes in R

Y is functionally dependent on X in R iff for each x R.X there is precisely one y R.Y

Y is fully functional dependent on X in R if Y is functional dependent on X and Y is not functional dependent on any proper subset of X

We use keys to enforce functional dependencies in relations:

X Y

X Y

Functional Dependencies and Keys

FLIGHTS


FLIGHTS


FLIGHTS



plane# is not determined by flt# alone

airline is not determined by flt# and date

the FLIGHT relation will not allow the FDs to be

enforced by keys


real world database

name

address

cust# name address

cust# name address

Consider the meaning

cust# name address

cust# name address

cust# name address combined

separate

How to Compute Meaning- Armstrong‟s inference rules

Rules of the computation: reflexivity: if YX, then XY

Augmentation: if XY, then WXWY

Transitivity: if XY and YZ, then XZ

Derived rules: Union: if XY and XZ, the XYZ

Decomposition: if XYZ, then XY and XZ

Pseudotransitivity: if XY and WYZ, then XWZ

Armstrong’s Axioms: sound

complete

How to Compute Meaning-the meaning of a set of FDs, F+umbrella: a collapsible shade consisting

of fabric stretched over hinged ribs

radiating from a central pole

Given the ribs of an umbrella, the FDs, what does the whole umbrella, F+, look like?

Determine each set of attributes, X, that appears on a left-hand side of a FD. Determine the set, X+, the closure of X under F.

How to Compute Meaningwhen do sets of FDs mean the same?

F covers E if every FD in E is also in F+

F and E are equivalent if F covers E and E covers F.

We can determine whether F covers E by calculating X+ with respect to F for each FD, XY in E, and then checking whether this X+

includes the attributes in Y+. If this is the case for every FD in E, then F covers E.

FE

F+

How to Compute Meaning- minimal cover of a set of FDs

• Is there a minimal set of ribs that will hold the

umbrella open?

• F is minimal if:

• every dependency in F has a single attribute as

right-hand side

• we can’t replace any dependency XA in F with a

dependency YA where YX and still have a set

of dependencies equivalent with F

• we can’t remove any dependency from F and still

have a set of dependencies equivalent with F

How to guarantee lossless joins

• Decompose relation, R, with functional

dependencies, F, into relations, R1 and R2, with

attributes, A1 and A2, and associated functional

dependencies, F1 and F2.

• The decomposition is lossless iff:

• A1A2A1\A2 is in F+, or

• A1A2A2 \A1 is in F+

R1 R2=R

How to guarantee preservation of FDs

Decompose relation, R, with functional dependencies, F, into relations, R1,..., Rk, with associated functional dependencies, F1,..., Fk.

The decomposition is dependency preserving iff:

F+=(F1... Fk)+

F+=(F1... Fk)+

Normal Forms

A relation is said to be in a particular normal form if it satisfies a certain specified set of constraints

Unnormalized relation

First Normal Form - removed repeating groups

Second Normal Form - removed partial dependencies

Third Normal Form - removed transitive dependencies

Normal Forms

Boyce-Codd Normal Form - removed remaining anomalies resulting from functional dependencies

Fourth Normal Form - removed multivalued dependencies

Fifth Normal Form - removed remaining anomalies

Domain-Key Normal Form - the upper bound of normal form

Overview of NFsNF2

1NF

2NF

3NF

BCNF

Normal Forms- definitionsNF2: non-first normal form

1NF: R is in 1NF. iff all domain values are atomic2

2NF: R is in 2. NF. iff R is in 1NF and every nonkey attribute is fully dependent on the key

3NF: R is in 3NF iff R is 2NF and every nonkey attribute is non-transitively dependent on the key

BCNF: R is in BCNF iff every determinant is a candidate key

Determinant: an attribute on which some other attribute is fully functionally dependent.

Normal Forms

Normal forms are based on functional dependencies.

Data normalized to,

Minimize redundancies

Minimize anomalies

Normal Forms

Relations are decomposed to normalize.

Concerns with decomposition;

Loss-less join property

Dependency preserving property

Normal Forms

Definitions

Superkey

Key – Minimal superkey

Candidatekey

Prime attributes

Example of Normalization

flt# date plane# airline from to miles

FLT-INSTANCE

flt#

dateplane#

airline

from

to

miles

First Normal Form

A relation is in first normal form if it contains no repeating groups

First Normal Form

First Normal Form

Grade Report with repeating group of courses for each student (Student ID, Student Name, Campus Address, Major, Course ID, Course Title, Instructor Name, Instructor Location, Grade)

Remove repeating group(Student ID, Student Name, Campus Address, Major) (3NF)(Student ID, Course ID, Course Title, Instructor Name, Instructor Location, Grade) (1NF)

Normal Forms

First Normal Form

Domain of an attribute must include only atomic (simple, indivisible) values

Value of any attribute in a tuple must be a single value from domain of that attribute.

Now considered part of definition of a relation in relational model.

Second Normal FormA relation is in second normal form if it is already in first normal form and any partial functional dependencies on the primary key have been removed

(Student ID, Course ID, Course Title, Instructor Name, Instructor Location, Grade) (1NF)Primary key is Student ID + Course IDStudent ID + Course ID --> GradeCourse ID --> Course Title (partial dependency)

Removing partial dependencies (Student ID, Course ID, Grade) (3NF)(Course ID, Course Title, Instructor Name, Instructor Location ) (2NF)

Normal Forms

Full Functional Dependency

Second Normal Form

Relation schema should be in first normal form.

Every non prime attribute in relation should not be partially dependant on any key of Relation.

flt#

dateplane#

airline

from

to

miles

flt#

dateplane#

flt#

airline

from

to

miles


1NF:

2NF:

Third Normal Form

A relation is in third normal form if it is already in second normal form and contains no transitive dependencies

transitive dependency - One nonkey attribute is dependent on one or more nonkey attributes

(Course ID, Course Title, Instructor Name, Instructor Location ) (2NF)Course ID --> Instructor Name --> Instructor LocationInstructor Name is nonkeyInstructor Location is dependent on Instructor Name

Remove transitive dependency(Course ID, Course Title, Instructor Name) (3NF) (Instructor Name, Instructor Location ) (3NF)

Normal Forms

Third Normal Form

Relation schema in second normal form.

No non prime attribute is transitively dependant on the primary key.

Normal Forms

Third Normal Form general definition:

A relation schema R is in third normal from if,

R is in second normal from.

And whenever a nontrivial functional dependency X → A holds in R either

a) X is a superkey of R. or

b) A is a prime attribute of R.

Third Normal Form“if it is in second normal form and has no

transitive dependencies”

Figure 5-7 © 2000 Prentice Hall

Boyce-Codd Normal Form“if every determinant is a candidate key”


Boyce-Codd Normal Form

A relation is in BCNF if and only if it is in 3NF and every determinant is a candidate key

A determinant is any attribute (simple or composite) on which some other attribute is fully functionally dependent

Situation:1. Multiple candidate keys2. Those candidate keys are composite3. The candidate keys are overlapped

Normal Forms


Strict than 3 NF

A relation schema R is in third normal from if, R is in second normal from.

And whenever a nontrivial functional dependency X → A holds in R, X is a superkey

of R.

Every relation schema in BCNF is in 3 NF as well.


(Student, Major, Advisor) (3NF)

or (Student, Advisor, Major) (1NF)Student may have more than one major with one advisor in each Advisor <<--> Major, Student <-->> Major, Student <-->>Advisor

(Student, Advisor) (BCNF)(Advisor, Major) (BCNF)


from

to

miles

flt#

airline

from

to

flt#

dateplane#

3NF &

BCNF:

3NF that is not BCNF

A

B C

Candidate keys: {A,B} and {A,C}

Determinants: {A,B} and {C}

A decomposition:

Lossless, but not dependency preserving!

A B C

R

C B

R1

A C

R2

Major Results in Normalization Theory

Theorem:

There is an algorithm for testing a decomposition for lossless join wrt. a set of FDs

Theorem:

There is an algorithm for testing a decomposition for dependency preservation

Theorem:

There is an algorithm for lossless join decomposition into BCNF

Theorem:

There is an algorithm for dependency preserving decomposition into 3NF

Normal Forms

Transitive Dependency

A FD X → Y in a relation schema R is a

transitive dependency if there is a set of attributes Z that is neither a candidate key nor a sub set of any key of R and both X → Z and Y → Z hold.

BCNF vs 3NF

BCNF: For every functional dependency X->Y in a set F of functional dependencies over relation R, either:

Y is a subset of X or,

X is a superkey of R

3NF: For every functional dependency X->Y in a set F of functional dependencies over relation R, either:


X is a superkey of R, or

Y is a subset of K for some key K of R

N.b., no subset of a key is a key

3NF Schema

Account Client Office

A Joe 1

B Mary 1

A John 1

C Joe 2

For every functional

dependency X->Y in a set F

of functional dependencies

over relation R, either:


X is a superkey of R, or


Client, Office -> Client, Office, Account

Account -> Office

BCNF vs 3NFFor every functional







3NF has some redundancy

BCNF does not

Unfortunately, BCNF is not

dependency preserving, but 3NF is

Client, Office -> Client, Office, Account

Account -> Office

Account Client Office

A Joe 1

B Mary 1

A John 1

C Joe 2

Account Office

A 1

B 1

C 2

Account Client

A Joe

B Mary

A John

C Joe

Account -> Office

No non-trivial FDs

Lossless

decompositi

on

ClosureWant to find all attributes A such that X -> A is true, given a set of functional dependencies F

define closure of X as X*

Closure(X):

c = X

Repeat

old = c

if there is an FD Z->V such that

Z c and

V c then

c = c U V

until old = c

return c

BCNFifyClosure(X):

c = X

Repeat

old = c

if there is an FD Z->V such that

Z c and

V c then

c = c U V

until old = c

return c

BCNFify(schema R, functional dependency set F):

D = {{R,F}}

while there is a schema S with dependencies F' in D that is not in BCNF,

do:

given X->Y as a BCNF-violating FD in F

such that XY is in S

replace S in D with

S1={XY,F1} and

S2={(S-Y) U X, F2}

where F1 and F2 are the FDs in F over S1 or S2

(may need to split some FDs using decomposition)

End

return D

For every functional






Theory of Multi-valued Dependency

Let R be a relation scheme and let

R and R.

The multi-valued dependency

Example:

BC-Scheme =( loan-number, customer-name, street, customer-city)

Customer-name street, customer-city


Inference rules for functional and Multi-valued dependencies are sound and complete.

Soundness means that the rules do not generate any dependencies that are not logically implies D (dependencies)

Completeness means that the rules allow us to generate all dependencies in D+ (all functional and Multi-valued dependencies)


1. Reflexivity rule.

1. If is a set of attributes and , then

holds

2. Augmentation rule.

1. If holds and is a set of attributes, then holds

3. Transitivity rule.

1. If holds and holds, then holds

4. Complementation rule.

1. If holds, then R - - holds


5. Multi-valued Augmentation rule.1. If holds and R and , then

holds

6. Multi-valued Transitivity rule.1. If holds and holds, then

- holds

7. Replication rule1. If holds, then .

8. Coalescence rule1. If holds and and there is a

such that R and = and , then holds.


Example:

Let R= (A,B,C,G,H,I)

Suppose A BC holds the definition of MD implies that if t1[A] = t2[A] then there exists tuples t3 and t4such that:

t1[A] = t2[A] = t3[A] = t4[A]

t3[BC] = t1[BC]

t3[GHI] = t2[GHI]

t4[GHI] = t1[GHI]

t4[BC] = t2[BC]

The complementation rule states that if ABC , then A GHI.


Some more Rules

If holds and holds, then holds

Multi-valued union rule

If holds and holds,. Then holds

Difference rule.

If holds and holds, then - holds and - holds.

Example

Lets apply the rules to the following example. Let R= (A,B,C,G,H,I) with the

following set of dependencies D given:

A B

B HI

CG H

List some members of D+

D+ :

A CGHI

A HI

B H

A CG

Fourth Normal Form

A relation is in fourth normal form if it is in BCNF and contains no multi-valued dependencies

Multi-valued Dependency There are three attributes (e.g. A,B,C) in a

relation.

For each value of A there is a well-defined set of value of B and a well-defined set of value of C.

The set of value of B is independent of the set of value of C, and vice versa.

Fourth Normal Form

(Course, Instructor, Textbook) (BCNF)One course is taught by several instructorsOne course uses the same set of textbooks by each instructor

(Course, Textbook) (4NF)(Course, Instructor) (4NF)

Fourth Normal Form“if in BCNF and has no multi-value

dependencies”


Fifth Normal Form

?

Page 125

Fifth Normal Form

Every join dependency is a consequence of its relation keys

A non 5NF: Person-using-skills-on-jobs (Person, Skill, Job)

5 NF: Has-skill (Person, Skill)Need-skill (Skill, Job)Assigned-to-job (Person, Job)

Domain Key Normal Form

“if every constraint on the relation is a logical consequence of the definition of keys and domains”

Page 125

Constraint “a rule governing static values of attributes”

Key “unique identifier of a tuple”

Domain “description of an attribute‟s allowed values”

Example of non DK/NF

Enrollment (Student ID, Course ID, Grade)

Key constraint: Student ID + Course ID --> Grade

Domain constraint: Student ID: 7 digits, Course ID: 3 digits, Grade: A,B,C,D,F,P

General constraintIf Course ID < 900 then Grade in {A,B,C,D,F}else Grade in {P,F}

Since the general constraint cannot be inferred from key constraint or domain constraint, it is not a DK/NF.

B-tree InsertionINSERTION OF KEY ’K’

find the correct leaf node ’L’;

if ( ’L’ overflows ){

split ’L’, by pushing the middle key upstairs to parent node ’P’;

if (’P’ overflows){

repeat the split recursively;

}

else{

add the key ’K’ in node ’L’; /* maintaining the key order in ’L’ */

}

B-tree deletion - pseudocodeDELETION OF KEY ’K’

locate key ’K’, in node ’N’

if( ’N’ is a non-leaf node) {

delete ’K’ from ’N’;

find the immediately largest key ’K1’;

/* which is guaranteed to be on a leaf node ’L’ */

copy ’K1’ in the old position of ’K’;

invoke this DELETION routine on ’K1’ from the leaf node ’L’;

else { /* ’N’ is a leaf node */

if( ’N’ underflows ){

let ’N1’ be the sibling of ’N’;

if( ’N1’ is "rich"){ /* ie., N1 can lend us a key */

borrow a key from ’N1’ THROUGH the parent node;

}else{ /* N1 is 1 key away from underflowing */

MERGE: pull the key from the parent ’P’,

and merge it with the keys of ’N’ and ’N1’ into a new node;

if( ’P’ underflows){ repeat recursively }

}

}

Remarks on Normalization

The notions of dependency and normalization are semantic in nature

The normalization guidelines should be regarded primarily as a discipline to help the database design

Limitations of normalization

may not natural, e.g. zip code, area code for phone #

May ignore operational considerations: need not change, may change over time. e.g. (order# , prod# ,description, unit-price, quantity)

Difficult to enforce integrity control(Order#, Prod#, quantity)(Prod#, Description, Unit-price)Prod# may not be valid.

Now the integrity control is provided by relational DBMS

De-normalization

Normalization is only one of many database design goals.

Normalized (decomposed) tables require additional processing, reducing system speed.

Normalization purity is often difficult to sustain in the modern database environment. The conflict between design efficiency, information requirements, and processing speed are often resolved through compromises that include denormalization.

Integrity Constraints

Unit-2 (Part A)


An important functionality of a DBMS is to enable the specification of integrity constraints and to enforce them.

Knowledge of integrity constraints is also useful for query

optimization.

Examples of constraints:keys, superkeysforeign keysdomain constraints, tuple constraints.Functional dependencies, multivalued dependencies.


1. Integrity constraints provide a way of ensuring that changes made to the database by authorized users do not result in a loss of data consistency.

2. We saw a form of integrity constraint with E-R models:

key declarations: stipulation that certain attributes form a candidate key for the entity set.

form of a relationship: mapping cardinalities 1-1, 1-many and many-many.

3. An integrity constraint can be any arbitrary predicate applied to the database.

4. They may be costly to evaluate, so we will only consider integrity constraints that can be tested with minimal overhead.

Domain Constraints1. A domain of possible values should be associated

with every attribute. These domain constraints are the most basic form of integrity constraint. They are easy to test for when data is entered.

2. Domain types

1. Attributes may have the same domain, e.g. cname and employee-name.

2. It is not as clear whether bname and cname domains ought to be distinct.

3. At the implementation level, they are both character strings.

4. At the conceptual level, we do not expect customers to have the same names as branches, in general.

5. Strong typing of domains allows us to test for values inserted, and whether queries make sense. Newer systems, particularly object-oriented database systems, offer a rich set of domain types that can be extended easily.

Domain Types in SQL

The SQL standard supports a restricted set of domain types:

Fixed length character string, with user specified length

Fixed point number, with user specified precision

Integer

Small Integer

Floating point number

Floating point and double precision floating point numbers with machine dependent precision

Null Values

Insertion of incomplete tuples can introduce null values into the database.

SQL allows the domain declaration of an attribute to include the specification not null.

This prohibits the insertion of a null value for the attribute.

Referential Integrity

Often we wish to ensure that a value appearing in a relation for a given set of attributes also appears for another set of attributes in another relation. This is called referential integrity.

Basic Concepts

1. Dangling tuples.

Consider a pair of relations r(R) and s(S), and the natural join r . s

There may be a tuple tr in r that does not join with any tuple in s.

That is, there is no tuple ts in s such that tr[R S]= ts [R S]

We call this a dangling tuple.

Dangling tuples may or may not be acceptable.

Basic Concepts

1. Suppose there is a tuple in the accountrelation with the value “Lunartown”, but no matching tuple in the branch relation for the Lunartown branch.

2. This is undesirable, as should refer to a branch that exists.

3. Now suppose there is a tuple in the branch relation with “Mokan”, but no matching tuple in the account relation for the Mokan branch.

4. This means that a branch exists for which no accounts exist. This is possible, for example, when a branch is being opened. We want to allow this situation.

Basic Concepts

1. Note the distinction between these two situations: bname is the primary key of branch, while it is not for account. In account, bname is a foreign key, being the primary key of another relation.

Let r1(R1) and r2(R2) be two relations with primary keys K1 and K2 respectively.

We say that a subset of R2 is a foreign keyreferencing K1 in relation r1 if it is required that for every tuple t2 in r2 there must be a tuple t1 in r1

such that t1[K1]= t2[ ]

We call these requirements referential integrity constraints.

Also known as subset dependencies, as we require

Referential Integrity in the E-R Model

1. These constraints arise frequently. Every relation arising from a relationship set has referential integrity constraints.

Referential Integrity in the E-R Model

Figure shows an n-ary relationship set Rrelating entity sets .

Let K in the scheme for R is a foreign key that leads to a referential integrity constraint.

Relation schemes for weak entity sets must include the primary key of the strong entity set on which they are existence dependent. This is a foreign key, which leads to another referential integrity constraint.

Referential Integrity in SQL

1. An addition to the original standard allows specification of primary and candidate keys and foreign keys as part of the create table command:

primary key clause includes a list of attributes forming the primary key.

unique key clause includes a list of attributes forming a candidate key.

foreign key clause includes a list of attributes forming the foreign key, and the name of the relation referenced by the foreign key.

Examples

create table branch

(bname char(15) not null,

bcity char(30),

assets integer,

primary key (bname)

check (assets >= 0))

Contd..

create table account

(account# char(10) not null,

(bname char(15),

balance integer,

primary key (account#)

foreign key (bname) referencesbranch Check (balance >= 0))

Contd..

create table depositor

(cname char(20) not null,

account# char(10) not null,

primary key (cname, account#)

foreign key (cname) referencescustomer,

foreign key (account#) references account)

Structured Query Language(SQL)

Unit-II

Part- B

SQL

Data Definition

Basic Query Structure

Set Operations

Aggregate Functions

Null Values

Nested Sub queries

Complex Queries

Views

Modification of the Database

HistoryIBM Sequel language developed as part of

System R

project at the IBM San Jose Research Laboratory

Renamed Structured Query Language (SQL)

ANSI and ISO standard SQL: - SQL-86

- SQL-89

- SQL-92

- SQL:1999 (language name became Y2K compliant!)

- SQL:2003

Commercial systems offer most, if not all, SQL-92 features, plus varying feature sets from later standards and special proprietary features.

Data Definition Language

The schema for each relation.

The domain of values associated with each attribute.

Integrity constraints

The set of indices to be maintained for each relations.

Security and authorization information for each relation.

The physical storage structure of each relation on disk.

Allows the specification of not only a set of relations but also

information about each relation, including:

Basic Query Structure SQL is based on set and relational operations with certain modifications and enhancements

A typical SQL query has the form:

select A1, A2, ..., Anfrom r1, r2, ..., rmwhere P

Ai represents an attribute

Ri represents a relation

P is a predicate.

This query is equivalent to the relational algebra expression.

The result of an SQL query is a relation.

))((21,,, 21 mPAAA

rrrn

The select Clause

The select clause list the attributes desired in the result of a query

corresponds to the projection operation of the relational algebra

Example: find the names of all branches in the loan

relation:

select branch_name

from loan

In the relational algebra, the query would be:

branch_name (loan)

NOTE: SQL names are case insensitive (i.e., you may use upper- or lower-case letters.)

Some people use upper case wherever we use bold font.

The select Clause (Cont.)SQL allows duplicates in relations as well as in query results.

To force the elimination of duplicates, insert the keyword distinct after select.

Find the names of all branches in the loan relations, and remove duplicates

select distinct branch_namefrom loan

The keyword all specifies that duplicates not be removed.

select all branch_namefrom loan

The select Clause (Cont.)

An asterisk in the select clause denotes “all attributes”

select *from loan

The select clause can contain arithmetic expressions involving the operation, +, –, , and /, and operating on constants or attributes of tuples.

The query:

select loan_number, branch_name, amount 100

from loan

would return a relation that is the same as the loan relation, except that the value of the attribute amount is multiplied by 100.

The where ClauseThe where clause specifies conditions that the result must satisfy

Corresponds to the selection predicate of the relational algebra.

To find all loan number for loans made at the Perryridge branch with loan amounts greater than $1200.

select loan_numberfrom loanwhere branch_name = „ Perryridge‟ and

amount > 1200

Comparison results can be combined using the logical connectives and, or, and not.

Comparisons can be applied to results of arithmetic expressions.

The where Clause (Cont.)

SQL includes a between comparison operator

Example: Find the loan number of those loans with loan amounts between $90,000 and $100,000 (that is, $90,000 and $100,000)

select loan_number

from loan

where amount between 90000 and 100000

The from Clause

The from clause lists the relations involved in the query Corresponds to the Cartesian product operation of the

relational algebra.

Find the Cartesian product borrower X loan

select from borrower, loan

Find the name, loan number and loan amount of all customers

having a loan at the Perryridge branch.

select customer_name, borrower.loan_number, amount

from borrower, loan

where borrower.loan_number = loan.loan_number and

branch_name = ‘Perryridge’

The Rename Operation

The SQL allows renaming relations and attributes using the as clause:

old-name as new-name

Find the name, loan number and loan amount

of all customers; rename the column name

loan_number as loan_id.

select customer_name, borrower.loan_number as loan_id, amount

from borrower, loan

where borrower.loan_number = loan.loan_number

Tuple VariablesTuple variables are defined in the fromclause via the use of the as clause.

Find the customer names and their loan numbers for all customers having a loan at some branch.

select distinct T.branch_name

from branch as T, branch as S

where T.assets > S.assets and S.branch_city = ‘ Brooklyn’

Find the names of all branches that have greater assets than

some branch located in Brooklyn.

select customer_name, T.loan_number, S.amountfrom borrower as T, loan as Swhere T.loan_number = S.loan_number

String OperationsSQL includes a string-matching operator for comparisons on character strings. The operator “like” uses patterns that are described using two special characters:

percent (%). The % character matches any substring.

underscore (_). The _ character matches any character.

Find the names of all customers whose street includes the substring “Main”.

select customer_namefrom customerwhere customer_street like ‘%Main%’

Match the name “Main%”

like ‘Main\%’ escape ‘\’

SQL supports a variety of string operations such as

concatenation (using “||”)

converting from upper to lower case (and vice versa)

finding string length, extracting substrings, etc.

Ordering the Display of TuplesList in alphabetic order the names of all customers having a loan in Perryridge branch

select distinct customer_namefrom borrower, loanwhere borrower loan_number =

loan.loan_number andbranch_name = ‘Perryridge’

order by customer_name

We may specify desc for descending order or asc for ascending order, for each attribute; ascending order is the default. Example: order by customer_name desc

DuplicatesIn relations with duplicates, SQL can define how many copies of tuples appear in the result.

Multiset versions of some of the relational algebra operators – given multiset relations r1

and r2:

1. (r1): If there are c1 copies of tuple t1 in r1,

and t1 satisfies selections ,, then there are c1

copies of t1 in (r1).

2. A (r ): For each copy of tuple t1 in r1, there is a copy of tuple A (t1) in A (r1) where A (t1) denotes the projection of the single tuple t1.

3. r1 x r2 : If there are c1 copies of tuple t1 in r1 and c2 copies of tuple t2 in r2, there are c1 x c2 copies of the tuple t . t in r x r

Duplicates (Cont.)Example: Suppose multiset relations r1 (A, B) and r2 (C) are as follows:

r1 = {(1, a) (2,a)} r2 = {(2), (3), (3)}

Then B(r1) would be {(a), (a)}, while B(r1) x r2 would be

{(a,2), (a,2), (a,3), (a,3), (a,3), (a,3)}

SQL duplicate semantics:

select A1,, A2, ..., Anfrom r1, r2, ..., rmwhere P

is equivalent to the multiset version of the expression: ))((

21,,, 21 mPAAArrr

n

Set OperationsThe set operations union, intersect, and exceptoperate on relations and correspond to the relational algebra operations

Each of the above operations automatically eliminates duplicates; to retain all duplicates use the corresponding multiset versions union all, intersect all and except all.

Suppose a tuple occurs m times in r and n times in s, then, it occurs:

m + n times in r union all s

min(m,n) times in r intersect all s

max(0, m – n) times in r except all s

Set Operations

Find all customers who have a loan, an account, or both:

(select customer_name from depositor)except(select customer_name from borrower)

(select customer_name from depositor)

intersect

(select customer_name from borrower)

Find all customers who have an account but no loan.

(select customer_name from depositor)

union

(select customer_name from borrower)

Find all customers who have both a loan and an account.

Aggregate Functions

These functions operate on the multiset of values of a column of a relation, and return a value

avg: average valuemin: minimum valuemax: maximum valuesum: sum of valuescount: number of values

Aggregate Functions (Cont.)

Find the average account balance at the Perryridge branch.

Find the number of depositors in the bank.

Find the number of tuples in the customer relation.

select avg (balance)

from account

where branch_name = ‘Perryridge’

select count (*)

from customer

select count (distinct customer_name)

from depositor

Aggregate Functions – Group By

Find the number of depositors for each branch.

Note: Attributes in select clause outside of aggregate functions must

appear in group by list

select branch_name, count (distinct customer_name)

from depositor, account

where depositor.account_number = account.account_number

group by branch_name

Aggregate Functions – Having Clause

Find the names of all branches where the average account balance is more than $1,200.

Note: predicates in the having clause are applied after the

formation of groups whereas predicates in the where

clause are applied before forming groups

select branch_name, avg (balance)

from account


having avg (balance) > 1200

Null ValuesIt is possible for tuples to have a null value, denoted by null, for some of their attributes

null signifies an unknown value or that a value does not exist.

The predicate is null can be used to check for null values.

Example: Find all loan number which appear in the loan relation with null values for amount.

select loan_numberfrom loanwhere amount is null

The result of any arithmetic expression involving nullis null

Example: 5 + null returns null

However, aggregate functions simply ignore nulls

More on next slide

Null Values and Three Valued Logic

Any comparison with null returns unknown

Example: 5 < null or null <> null or null = null

Three-valued logic using the truth value unknown:

OR: (unknown or true) = true, (unknown orfalse) = unknown

(unknown or unknown) = unknown

AND: (true and unknown) = unknown, (falseand unknown) = false,

(unknown and unknown) = unknown

NOT: (not unknown) = unknown

“P is unknown” evaluates to true if predicate Pevaluates to unknown

Result of where clause predicate is treated as false if it evaluates to unknown

Null Values and Aggregates

Total all loan amounts

select sum (amount )from loan

Above statement ignores null amounts

Result is null if there is no non-null amount

All aggregate operations except count(*) ignore tuples with null values on the aggregated attributes.

Nested Subqueries

SQL provides a mechanism for the nesting of subqueries.

A subquery is a select-from-whereexpression that is nested within another query.

A common use of subqueries is to perform tests for set membership, set comparisons, and set cardinality.

Example Query

Find all customers who have both an account and a loan at the bank.

Find all customers who have a loan at the bank but do not have

an account at the bank

select distinct customer_name

from borrower

where customer_name not in (select customer_name

from depositor )


from borrower

where customer_name in (select customer_name

from depositor )

Example QueryFind all customers who have both an account and a loan at the Perryridge branch

Note: Above query can be written in a much simpler manner. The

formulation above is simply to illustrate SQL features.


from borrower, loan

where borrower.loan_number = loan.loan_number and

branch_name = ‘Perryridge’ and

(branch_name, customer_name ) in

(select branch_name, customer_name


where depositor.account_number =

account.account_number )

Set Comparison

Find all branches that have greater assets than some branch located in Brooklyn.

Same query using > some clause

select branch_name

from branch

where assets > some

(select assets

from branch

where branch_city = ‘Brooklyn’)

select distinct T.branch_name

from branch as T, branch as S

where T.assets > S.assets and

S.branch_city = ‘ Brooklyn’

Definition of Some Clause

F <comp> some r t r such that (F <comp> t )Where <comp> can be:

05

6

(5 < some ) = true

05

0

) = false

5

05(5 some ) = true (since 0 5)

(read: 5 < some tuple in the relation)

(5 > some

) = true(5 = some

(= some) in

However, ( some) not in

Example Query

Find the names of all branches that have greater assets than all branches located in Brooklyn.

select branch_name

from branch

where assets > all

(select assets

from branch


Definition of all Clause

F <comp> all r t r (F <comp> t)

05

6

(5 < all ) = false

610

4

) = true

5

46(5 all ) = true (since 5 4 and 5 6)

(5 < all

) = false(5 = all

( all) not in

However, (= all) in

Test for Empty Relations

The exists construct returns the value true if the argument subquery is nonempty.

exists r r Ø

not exists r r = Ø

Example QueryFind all customers who have an account at all branches located in Brooklyn.select distinct S.customer_name

from depositor as S

where not exists (

(select branch_name

from branch


except

(select R.branch_name

from depositor as T, account as R

where T.account_number = R.account_number and

S.customer_name = T.customer_name ))

Note that X – Y = Ø X Y

Note: Cannot write this query using = all and its variants

Test for Absence of Duplicate Tuples

The unique construct tests whether a subquery has any duplicate tuples in its result.

Find all customers who have at most one account at the Perryridge branch.

select T.customer_namefrom depositor as Twhere unique (

select R.customer_namefrom account, depositor as Rwhere T.customer_name = R.customer_name

andR.account_number =

account.account_number andaccount.branch_name = „ Perryridge‟ )

Example Query

Find all customers who have at least two accounts at the Perryridge branch.

select distinct T.customer_name

from depositor as T

where not unique (

select R.customer_name

from account, depositor as R

where T.customer_name = R.customer_name and

R.account_number = account.account_number and

account.branch_name = ‘Perryridge’)

Derived Relations

SQL allows a subquery expression to be used in the from clause

Find the average account balance of those branches where the average account balance is greater than $1200.

select branch_name, avg_balancefrom (select branch_name, avg (balance)

from accountgroup by branch_name )as branch_avg ( branch_name,

avg_balance )where avg_balance > 1200

Note that we do not need to use the having clause, since we compute the temporary (view) relation branch_avg in the from clause, and the attributes of branch_avg can be used directly in the where clause.

With ClauseThe with clause provides a way of defining a temporary view whose definition is available only to the query in which the with clause occurs.

Find all accounts with the maximum balance

with max_balance (value) asselect max (balance)from account

select account_numberfrom account, max_balancewhere account.balance =

max_balance.value

Complex Query using With Clause

Find all branches where the total account deposit is greater than the average of the total account deposits at all branches.

with branch_total (branch_name, value) as

select branch_name, sum (balance)

from account


with branch_total_avg (value) as

select avg (value)

from branch_total

select branch_name

from branch_total, branch_total_avg

where branch_total.value >= branch_total_avg.value

ViewsIn some cases, it is not desirable for all users to see the entire logical model (that is, all the actual relations stored in the database.)

Consider a person who needs to know a customer‟s loan number but has no need to see the loan amount. This person should see a relation described, in SQL, by

(select customer_name, loan_numberfrom borrower, loanwhere borrower.loan_number =

loan.loan_number )

A view provides a mechanism to hide certain data from the view of certain users.

Any relation that is not of the conceptual model but is made visible to a user as a “virtual relation” is called a view.

View DefinitionA view is defined using the create view statement which has the form

create view v as < query expression >

where <query expression> is any legal SQL expression. The view name is represented by v.

Once a view is defined, the view name can be used to refer to the virtual relation that the view generates.

View definition is not the same as creating a new relation by evaluating the query expression Rather, a view definition causes the saving of an

expression; the expression is substituted into queries using the view.

Example QueriesA view consisting of branches and their customers

Find all customers of the Perryridge branch

create view all_customer as



where depositor.account_number =

account.account_number )

union


from borrower, loan

where borrower.loan_number = loan.loan_number )

select customer_name

from all_customer

where branch_name = ‘Perryridge’

Views Defined Using Other Views

One view may be used in the expression defining another view

A view relation v1 is said to depend directlyon a view relation v2 if v2 is used in the

expression defining v1

A view relation v1 is said to depend on view

relation v2 if either v1 depends directly to v2

or there is a path of dependencies from v1

to v2

A view relation v is said to be recursive if it depends on itself.

View ExpansionA way to define the meaning of views defined in terms of other views.

Let view v1 be defined by an expression e1 that may itself contain uses of view relations.

View expansion of an expression repeats the following replacement step:

repeatFind any view relation vi in e1Replace the view relation vi by the expression

defining vi

until no more view relations are present in e1

As long as the view definitions are not recursive, this loop will terminate

Modification of the Database –Deletion

Delete all account tuples at the Perryridge branch

delete from accountwhere branch_name = ‘Perryridge’

Delete all accounts at every branch located in the city „Needham‟.

delete from accountwhere branch_name in (select branch_name

from branchwhere branch_city =

‘Needham’)

Example Query

Delete the record of all accounts with balances below the average at the bank.delete from account

where balance < (select avg (balance )

from account )

Problem: as we delete tuples from deposit, the average balance

changes

Solution used in SQL:

1. First, compute avg balance and find all tuples to delete

2. Next, delete all tuples found above (without recomputing avg or

retesting the tuples)

Modification of the Database –Insertion

Add a new tuple to account

insert into accountvalues („A-9732‟, „Perryridge‟,1200)

or equivalently

insert into account (branch_name, balance, account_number)

values („Perryridge‟, 1200, „A-9732‟)

Add a new tuple to account with balance set to null

insert into accountvalues („A-777‟,„Perryridge‟, null )

Modification of the Database –InsertionProvide as a gift for all loan customers of the Perryridge branch, a $200 savings account. Let the loan number serve as the account number for the new savings account

insert into accountselect loan_number, branch_name, 200from loanwhere branch_name = „Perryridge‟

insert into depositorselect customer_name, loan_numberfrom loan, borrowerwhere branch_name = „ Perryridge‟

and loan.account_number = borrower.account_number

The select from where statement is evaluated fully before any of its results are inserted into the relation (otherwise queries like

insert into table1 select * from table1would cause problems)

Modification of the Database –Updates

Increase all accounts with balances over $10,000 by 6%, all other accounts receive 5%. Write two update statements:

update accountset balance = balance 1.06where balance > 10000

update accountset balance = balance 1.05where balance 10000

The order is important

Can be done better using the case statement (next slide)

Case Statement for Conditional Updates

Same query as before: Increase all accounts with balances over $10,000 by 6%, all other accounts receive 5%.

update accountset balance = case

when balance <= 10000 then balance *1.05

else balance * 1.06end

Update of a ViewCreate a view of all loan data in the loanrelation, hiding the amount attribute

create view branch_loan asselect branch_name, loan_numberfrom loan

Add a new tuple to branch_loan

insert into branch_loanvalues („Perryridge‟, „L-307‟)

This insertion must be represented by the insertion of the tuple

(„L-307‟, „Perryridge‟, null )

into the loan relation

Updates Through Views (Cont.)

Some updates through views are impossible to translate into updates on the database relations create view v as

select branch_name from account

insert into v values (‘L-99‟, „ Downtown‟, „23‟)

Others cannot be translated uniquely insert into all_customer values („ Perryridge‟,

„John’) Have to choose loan or account, and

create a new loan/account number!

Most SQL implementations allow updates only on simple views (without aggregates) defined on a single relation

Joined RelationsJoin operations take two relations and return as a result another relation.

These additional operations are typically used as subquery expressions in the fromclause

Join condition – defines which tuples in the two relations match, and what attributes are present in the result of the join.

Join type – defines how tuples in each relation that do not match any tuple in the other relation (based on the join condition) are treated.

Joined Relations – Datasets for Examples

Relation loan

Relation borrower

Note: borrower information missing for L-260 and loan

information missing for L-155

Joined Relations – Examples

loan inner join borrower onloan.loan_number = borrower.loan_number

loan left outer join borrower on

loan.loan_number = borrower.loan_number


loan natural inner join borrower

loan natural right outer join borrower


loan full outer join borrower using (loan_number)

Find all customers who have either an account or a loan (but not both)

at the bank.

select customer_name

from (depositor natural full outer join borrower )

where account_number is null or loan_number is null

Database Schema

branch (branch_name, branch_city, assets)

customer (customer_name, customer_street,

customer_city)

loan (loan_number, branch_name, amount)

borrower (customer_name, loan_number)

account (account_number, branch_name,

balance)

depositor (customer_name, account_number)

Tuples inserted into loan and borrower

The loan and borrower relations

Relational Database Designrbvrrwomenscollege.net/wp-content/uploads/2018/05/RDBMS-Unit-II.pdf · A...

Documents

Transcript of Relational Database Designrbvrrwomenscollege.net/wp-content/uploads/2018/05/RDBMS-Unit-II.pdf · A...