Relational Database Designrbvrrwomenscollege.net/wp-content/uploads/2018/05/RDBMS-Unit-II.pdf · A...
Transcript of Relational Database Designrbvrrwomenscollege.net/wp-content/uploads/2018/05/RDBMS-Unit-II.pdf · A...
Relational Database Design
Unit-2
(Part-A)
A relational DBMS must use its relational facilities exclusively to manage and interact with the database.
The rules:
These rules were defined by Codd in a paper published in 1985. They specify what a relational database must support in order to be relational. These rules have been considerably extended .
1. Information rule
Data are represented only one way: as values within columns within rows.
Simple, consistent and versatile.
The basic requirement of the relational model.
2. Guaranteed access rule
Every value can be accessed by providing table name, column name and key.
All data are uniquely identified and accessible via this identity.
3. Systematic treatment of null values
Separate handling of missing and/or non applicable data.
This is distinct to zero or empty strings
Codd would further like several types of null to be handled.
4. Relational online catalog
Catalog (data dictionary) can be queried by authorized users as part of the database.
The catalog is part of the database.
5. Comprehensive data sublanguage
Used interactively and embedded within programs
Supports data definition, data manipulation, security, integrity constraints and transaction processing
Today means: must support SQL.
6. View updating rule
All theoretically possible view updates should be possible.
Views are virtual tables. They appear to behave as conventional tables except that they are built dynamically when the query is run.
This means that a view is always up to date. It is not always theoretically possible to update views. Codd himself, did not completely understand this.
One problem exists when a view relates to part of a table not including a candidate key. This means that potential updates would violate the entity integrity rule.
7. High-level insert, update and
delete
Must support set-at-a-time updates.
ie. Transactions
eg: UPDATE mytable SET mycol = valueWHERE condition; Many rows may be updated with this single statement.
8. Physical data independence
Physical layer of the architecture is mapped onto the logical layer.
Users and programs are not dependent on the physical structure of the database.
(Physical layer implementation is dependent on the DBMS.)
9. Logical data independence
Users and programs are independent of the logical structure of the database.
i.e.: the logical structure of the data can evolve with minimal impact on the programs.
10. Integrity independence
Integrity constraints are to be stored in the catalog not the programs.
Alterations to integrity constraints should not affect application programs.
This simplifies the programs.
It is not always possible to do this.
11. Distribution independence
Applications should still work in a distributed database (DDB).
12. Nonsubversion rule
If there is a record-at-a-time interface (eg via 3GL), security and integrity of the database must not be violated.
There should be no backdoor to bypass the security imposed by the DBMS.
Rule Zero for RDBMS:Many new DBMS claim to be relational plus supporting extended features. eg. PostgreSQL is a RDBMS with extended Object Oriented features. Codd's rule zero specifies a criteria for RDBMS:
"For any system that is advertised as, or claimed to be, a relational database management system, that system must be able to manage databases entirely through its relational capabilities, no matter what additional capabilities the system may support." (Codd, 1990)
In Codd 1990, Codd extended the 12 rules to 18 to include rules on catalog, data types (domains), authorisation etc.
Functional Dependencies
A functional dependency, denoted by X → Y, between two sets of attributes X and Y that are subsets of Rspecifies a constraint on the possible tuples that can form a relation state r of R . The constraint is for any two tuples t1 and t2 that have t1[X] = t2[X], we must also have t1[Y] = t2[Y].
Functional Dependencies
Inference rules for functional dependencies:
1. If X Y(super subset) then X → Y
2. X → Y XZ → YZ
3. { X → Y , Y → Z } X → Z
4. X → YZ X → Y
5. { X → Y , X → Z } X → YZ
6. { X → Y , WY → Z } WX → Z
Functional Dependencies
Inference rules 1 – 3 are known as Armstrong‟s inference rules.
IR 1 – IR 3 are,
Sound
Complete
Functional Dependencies
Closure of set of functional dependencies F denoted by F+ is the set
of all functional dependencies that can be inferred from F.
For a set of attributes X set of all attributes that depend on X under set of
functional dependencies F is called closure of X under F (denoted by X+).
Functional Dependencies
A set of functional dependencies E is said to
covered by F if every FD in E is also in F+.
Two sets of FDs E and F are equivalent if E+
= F+.
If E and F are equivalent then
E covers F
F covers E
Functional Dependencies
A set of FDs F is minimal if it satisfies following conditions.
1. Every dependency in F has a single attribute in its RHS.
2. We can not replace any dependency X → A in Fwith dependency Y → A, where Y is a proper subset of X and still have a set of dependencies that is equivalent to F.
3. We can not remove any dependency from F and still have a set of dependencies that is equivalent to F.
Functional Dependencies
A minimal set of functional dependencies is a set of dependencies in standard form with no redundancies.
Minimal cover of set of functional dependencies is the minimal set of functional dependencies.
Functional Dependencies
X -> A is an assertion about a relation Rthat whenever two tuples of R agree on all the attributes of X, then they must also agree on the attribute A.
Say “X -> A holds in R.”
Convention: …, X, Y, Z represent sets of attributes; A, B, C,… represent single attributes.
Convention: no set formers in sets of attributes, just ABC, rather than {A,B,C }.
Example
Drinkers(name, addr, beersLiked, manf, favBeer)
Reasonable FD‟s to assert:
1. name -> addr
2. name -> favBeer
3. beersLiked -> manf
Example Data
name addr beersLiked manf favBeerJaneway Voyager Bud A.B. WickedAle
Janeway Voyager WickedAle Pete’s WickedAle
Spock Enterprise Bud A.B. Bud
Because name -> addr Because name -> favBeer
Because beersLiked -> manf
FD‟s With Multiple Attributes
No need for FD‟s with > 1 attribute on right.
But sometimes convenient to combine FD‟s as a shorthand.
Example: name -> addr andname -> favBeer become name -> addr favBeer
> 1 attribute on left may be essential.
Example: bar beer -> price
Keys of Relations
K is a superkey for relation R if K functionally determines all of R.
K is a key for R if K is a superkey, but no proper subset of K is a superkey.
Example
Drinkers(name, addr, beersLiked, manf,favBeer)
{name, beersLiked} is a superkey because together these attributes determine all the other attributes.
name -> addr favBeer
beersLiked -> manf
Example, Contd.
{name, beersLiked} is a key because neither {name} nor {beersLiked} is a superkey.
name doesn‟t -> manf; beersLiked doesn‟t -> addr.
There are no other keys, but lots of superkeys.
Any superset of {name, beersLiked}.
E/R and Relational Keys
Keys in E/R concern entities.
Keys in relations concern tuples.
Usually, one tuple corresponds to one entity, so the ideas are the same.
But --- in poor relational designs, one entity can become several tuples, so E/R keys and Relational keys are different.
Example Data
name addr beersLiked manf favBeerJaneway Voyager Bud A.B. WickedAle
Janeway Voyager WickedAle Pete’s WickedAle
Spock Enterprise Bud A.B. Bud
Relational key = {name beersLiked}But in E/R, name is a key for Drinkers, and beersLiked is a key
for Beers.Note: 2 tuples for Janeway entity and 2 tuples for Bud entity.
More FD‟s From “Physics”
Example: “no two courses can meet in the same room at the same time” tells us: hour room -> course.
Inferring FD‟s
We are given FD‟s X1 -> A1, X2 -> A2,…, Xn -> An , and we want to know whether an FD Y -> B must hold in any relation that satisfies the given FD‟s.
Example: If A -> B and B -> C hold, surely A -> C holds, even if we don‟t say so.
Important for design of good relation schemas.
Inference Test
Use the given FD‟s to infer that these tuples must also agree in certain other attributes.
If B is one of these attributes, then Y -> Bis true.
Otherwise, the two tuples, with any forced equalities, form a two-tuple relation that proves Y -> B does not follow from the given FD‟s.
Closure Test
An easier way to test is to compute the closure of Y, denoted Y +.
Basis: Y + = Y.
Induction: Look for an FD‟s left side Xthat is a subset of the current Y +. If the FD is X -> A, add A to Y +.
Finding All Implied FD‟s
Motivation: “normalization,” the process where we break a relation schema into two or more schemas.
Example: ABCD with FD‟s AB ->C, C ->D, and D ->A.
Decompose into ABC, AD. What FD‟s hold in ABC ?
Not only AB ->C, but also C ->A !
Basic Idea
1. Start with given FD‟s and find all nontrivial FD‟s that follow from the given FD‟s.
Nontrivial = left and right sides disjoint.
2. Restrict to those FD‟s that involve only attributes of the projected schema.
Example
ABC with FD‟s A ->B and B ->C. Project onto AC.
A +=ABC ; yields A ->B, A ->C.
We do not need to compute AB + or AC +.
B +=BC ; yields B ->C.
C +=C ; yields nothing.
BC +=BC ; yields nothing.
Example --- Continued
Resulting FD‟s: A ->B, A ->C, and B->C.
Projection onto AC : A ->C.
Only FD that involves a subset of {A,C }.
A Geometric View of FD‟s
Imagine the set of all instances of a particular relation.
That is, all finite sets of tuples that have the proper number of components.
Each instance is a point in this space.
Example: R(A,B)
{(1,2), (3,4)}
{}
{(1,2), (3,4), (1,3)}
{(5,1)}
An FD is a Subset of Instances
For each FD X -> A there is a subset of all instances that satisfy the FD.
We can represent an FD by a region in the space.
Trivial FD = an FD that is represented by the entire space.
Example: A -> A.
Example: A -> B for R(A,B)
{(1,2), (3,4)}
{}
{(1,2), (3,4), (1,3)}
{(5,1)}A -> B
Representing Sets of FD‟s
If each FD is a set of relation instances, then a collection of FD‟s corresponds to the intersection of those sets.
Intersection = all instances that satisfy all of the FD‟s.
Example
A->B
B->C
CD->A
Instances satisfyingA->B, B->C, andCD->A
Implication of FD‟s
If an FD Y -> B follows from FD‟s X1 -> A1,…,Xn -> An , then the region in the space of instances for Y -> B must include the intersection of the regions for the FD‟s Xi -> Ai . That is, every instance satisfying all the
FD‟s Xi -> Ai surely satisfies Y -> B.
But an instance could satisfy Y -> B, yet not be in this intersection.
Example
A->B B->CA->C
Normalization
E.F. Codd defined well-structured “normal forms” of relations, “normalization”
Normalization
Reduce complex user views to a set of small, stable data structures
Eliminate errors and inconsistencies related to the adding, deleting or updating of record occurrences
What it‟s all about
Given a relation, R, and a set of functional dependencies, F, on R.
Assume that R is not in a desirable form for enforcing F.
Decompose relation R into relations, R1,..., Rk, with associated functional dependencies, F1,..., Fk, such that R1,..., Rk are in a more desirable form, 3NF or BCNF.
While decomposing R, make sure to preserve the dependencies, and make sure not to lose information.
ContentsThe Good and the Bad
Bad database design redundancy of fact
fact clutter
information loss
dependency loss
Good database design
How to compute with meaning functional dependencies - FDs
Armstrong‟s inference rules
the meaning of a set of FDs
minimal cover of a set of FDs
Normal Forms - overview
1NF, 2NF, 3NF, BCNF
The Good
Primitive DomainsFLT-SCHEDULE
flt# weekday airline dtime from atime to
DL242 MO WE FR DELTA 10:40 ATL 12:30 BOS
SK912 SA SU SAS 12:00 CPH 15:30 JFK
AA242 MO FR AA 08:00 CHI 10:10 ATL
Attributes must be defined over domains
with atomic values
FLT-SCHEDULE
flt# weekday airline dtime from atime to
DL242 MO DELTA 10:40 ATL 12:30 BOS
SK912 SA SAS 12:00 CPH 15:30 JFK
AA242 MO AA 08:00 CHI 10:10 ATL
DL242 WE DELTA 10:40 ATL 12:30 BOS
DL242 FR DELTA 10:40 ATL 12:30 BOS
SK912 SU SAS 12:00 CPH 15:30 JFK
AA242 FR AA 08:00 CHI 10:10 ATL
Bad Database Design- redundancy of fact
FLIGHTS
flt# date airline plane#
DL242 10/23/00 Delta k-yo-33297
DL242 10/24/00 Delta t-up-73356
DL242 10/25/00 Delta o-ge-98722
AA121 10/24/00 American p-rw-84663
AA121 10/25/00 American q-yg-98237
AA411 10/22/00 American h-fe-65748
redundancy: airline name repeated for same flight
inconsistency: when airline name for a flight changes, it must be changed many places
Bad Database Design- fact clutter
insertion anomalies: how do we represent that SK912 is flown by Scandinavian without there being a date and a plane assigned?
deletion anomalies: cancelling AA411 on 10/22/00 makes us lose that it is flown by American.
update anomalies: if DL242 is flown by Sabena, we must change it everywhere.
FLIGHTS
flt# date airline plane#
DL242 10/23/00 Delta k-yo-33297
DL242 10/24/00 Delta t-up-73356
DL242 10/25/00 Delta o-ge-98722
AA121 10/24/00 American p-rw-84663
AA121 10/25/00 American q-yg-98237
AA411 10/22/00 American h-fe-65748
Bad Database Design- information loss
FLIGHTS
flt# date airline plane#
DL242 10/23/14 Delta k-yo-33297
DL242 10/24/14 Delta t-up-73356
DL242 10/25/14 Delta o-ge-98722
AA121 10/24/14 American p-rw-84663
AA121 10/25/14 American q-yg-98237
AA411 10/22/14 American h-fe-65748
FLIGHTS-AIRLINE
flt# airline
DL242 Delta
AA121 American
AA411 American
DATE-AIRLINE-PLANE
date airline plane#
10/23/14 Delta k-yo-33297
10/24/14 Delta t-up-73356
10/25/14 Delta o-ge-98722
10/24/14 American p-rw-84663
10/25/14 American q-yg-98237
10/22/14 American h-fe-65748
Bad Database Design- information loss
FLIGHTS
flt# date airline plane#
DL242 10/23/14 Delta k-yo-33297
DL242 10/24/14 Delta t-up-73356
DL242 10/25/14 Delta o-ge-98722
AA121 10/24/14 American p-rw-84663
AA121 10/25/14 American q-yg-98237
AA211 10/22/14 American h-fe-65748
AA411 10/24/14 American p-rw-84663
AA411 10/25/14 American q-yg-98237
AA411 10/22/14 American h-fe-65748
DATE-AIRLINE-PLANE
date airline plane#
10/23/14 Delta k-yo-33297
10/24/14 Delta t-up-73356
10/25/14 Delta o-ge-98722
10/24/14 American p-rw-84663
10/25/14 American q-yg-98237
10/22/14 American h-fe-65748
FLIGHTS-AIRLINE
flt# airline
DL242 Delta
AA121 American
AA411 American
• information loss:
we polluted the
database with false
facts; we can’t find
the true facts.
Bad Database Design- dependency loss
DATE-AIRLINE-PLANE
date airline plane#
10/23/14 Delta k-yo-33297
10/24/14 Delta t-up-73356
10/25/14 Delta o-ge-98722
10/24/14 American p-rw-84663
10/25/14 American q-yg-98237
10/22/14 American h-fe-65748
FLIGHTS-AIRLINE
flt# airline
DL242 Delta
AA121 American
AA411 American
dependency loss: we lost the fact that (flt#, date) plane#
Good Database Design
no redundancy of FACT (!)
no inconsistency
no insertion, deletion or update anomalies
no information loss
no dependency loss
FLIGHTS-DATE-PLANE
flt# date plane#
DL242 10/23/14 k-yo-33297
DL242 10/24/14 t-up-73356
DL242 10/25/14 o-ge-98722
AA121 10/24/14 p-rw-84663
AA121 10/25/14 q-yg-98237
AA411 10/22/14 h-fe-65748
FLIGHTS-AIRLINE
flt# airline
DL242 Delta
AA121 American
AA411 American
Let X and Y be sets of attributes in R
Y is functionally dependent on X in R iff for each x R.X there is precisely one y R.Y
Y is fully functional dependent on X in R if Y is functional dependent on X and Y is not functional dependent on any proper subset of X
We use keys to enforce functional dependencies in relations:
X Y
X Y
Functional Dependencies and Keys
FLIGHTS
flt# date airline plane#
FLIGHTS
flt# date airline plane#
FLIGHTS
flt# date airline plane#
Functional Dependencies and Keys
plane# is not determined by flt# alone
airline is not determined by flt# and date
the FLIGHT relation will not allow the FDs to be
enforced by keys
Functional Dependencies and Keys
real world database
name
address
cust# name address
cust# name address
Consider the meaning
cust# name address
cust# name address
cust# name address combined
separate
How to Compute Meaning- Armstrong‟s inference rules
Rules of the computation: reflexivity: if YX, then XY
Augmentation: if XY, then WXWY
Transitivity: if XY and YZ, then XZ
Derived rules: Union: if XY and XZ, the XYZ
Decomposition: if XYZ, then XY and XZ
Pseudotransitivity: if XY and WYZ, then XWZ
Armstrong’s Axioms: sound
complete
How to Compute Meaning-the meaning of a set of FDs, F+umbrella: a collapsible shade consisting
of fabric stretched over hinged ribs
radiating from a central pole
Given the ribs of an umbrella, the FDs, what does the whole umbrella, F+, look like?
Determine each set of attributes, X, that appears on a left-hand side of a FD. Determine the set, X+, the closure of X under F.
How to Compute Meaningwhen do sets of FDs mean the same?
F covers E if every FD in E is also in F+
F and E are equivalent if F covers E and E covers F.
We can determine whether F covers E by calculating X+ with respect to F for each FD, XY in E, and then checking whether this X+
includes the attributes in Y+. If this is the case for every FD in E, then F covers E.
FE
F+
How to Compute Meaning- minimal cover of a set of FDs
• Is there a minimal set of ribs that will hold the
umbrella open?
• F is minimal if:
• every dependency in F has a single attribute as
right-hand side
• we can’t replace any dependency XA in F with a
dependency YA where YX and still have a set
of dependencies equivalent with F
• we can’t remove any dependency from F and still
have a set of dependencies equivalent with F
How to guarantee lossless joins
• Decompose relation, R, with functional
dependencies, F, into relations, R1 and R2, with
attributes, A1 and A2, and associated functional
dependencies, F1 and F2.
• The decomposition is lossless iff:
• A1A2A1\A2 is in F+, or
• A1A2A2 \A1 is in F+
R1 R2=R
How to guarantee preservation of FDs
Decompose relation, R, with functional dependencies, F, into relations, R1,..., Rk, with associated functional dependencies, F1,..., Fk.
The decomposition is dependency preserving iff:
F+=(F1... Fk)+
F+=(F1... Fk)+
Normal Forms
A relation is said to be in a particular normal form if it satisfies a certain specified set of constraints
Unnormalized relation
First Normal Form - removed repeating groups
Second Normal Form - removed partial dependencies
Third Normal Form - removed transitive dependencies
Normal Forms
Boyce-Codd Normal Form - removed remaining anomalies resulting from functional dependencies
Fourth Normal Form - removed multivalued dependencies
Fifth Normal Form - removed remaining anomalies
Domain-Key Normal Form - the upper bound of normal form
Overview of NFsNF2
1NF
2NF
3NF
BCNF
Normal Forms- definitionsNF2: non-first normal form
1NF: R is in 1NF. iff all domain values are atomic2
2NF: R is in 2. NF. iff R is in 1NF and every nonkey attribute is fully dependent on the key
3NF: R is in 3NF iff R is 2NF and every nonkey attribute is non-transitively dependent on the key
BCNF: R is in BCNF iff every determinant is a candidate key
Determinant: an attribute on which some other attribute is fully functionally dependent.
Normal Forms
Normal forms are based on functional dependencies.
Data normalized to,
Minimize redundancies
Minimize anomalies
Normal Forms
Relations are decomposed to normalize.
Concerns with decomposition;
Loss-less join property
Dependency preserving property
Normal Forms
Definitions
Superkey
Key – Minimal superkey
Candidatekey
Prime attributes
Example of Normalization
flt# date plane# airline from to miles
FLT-INSTANCE
flt#
dateplane#
airline
from
to
miles
First Normal Form
A relation is in first normal form if it contains no repeating groups
First Normal Form
First Normal Form
Grade Report with repeating group of courses for each student (Student ID, Student Name, Campus Address, Major, Course ID, Course Title, Instructor Name, Instructor Location, Grade)
Remove repeating group(Student ID, Student Name, Campus Address, Major) (3NF)(Student ID, Course ID, Course Title, Instructor Name, Instructor Location, Grade) (1NF)
Normal Forms
First Normal Form
Domain of an attribute must include only atomic (simple, indivisible) values
Value of any attribute in a tuple must be a single value from domain of that attribute.
Now considered part of definition of a relation in relational model.
Second Normal FormA relation is in second normal form if it is already in first normal form and any partial functional dependencies on the primary key have been removed
(Student ID, Course ID, Course Title, Instructor Name, Instructor Location, Grade) (1NF)Primary key is Student ID + Course IDStudent ID + Course ID --> GradeCourse ID --> Course Title (partial dependency)
Removing partial dependencies (Student ID, Course ID, Grade) (3NF)(Course ID, Course Title, Instructor Name, Instructor Location ) (2NF)
Normal Forms
Full Functional Dependency
Second Normal Form
Relation schema should be in first normal form.
Every non prime attribute in relation should not be partially dependant on any key of Relation.
flt#
dateplane#
airline
from
to
miles
flt#
dateplane#
flt#
airline
from
to
miles
Example of Normalization
1NF:
2NF:
Third Normal Form
A relation is in third normal form if it is already in second normal form and contains no transitive dependencies
transitive dependency - One nonkey attribute is dependent on one or more nonkey attributes
(Course ID, Course Title, Instructor Name, Instructor Location ) (2NF)Course ID --> Instructor Name --> Instructor LocationInstructor Name is nonkeyInstructor Location is dependent on Instructor Name
Remove transitive dependency(Course ID, Course Title, Instructor Name) (3NF) (Instructor Name, Instructor Location ) (3NF)
Normal Forms
Third Normal Form
Relation schema in second normal form.
No non prime attribute is transitively dependant on the primary key.
Normal Forms
Third Normal Form general definition:
A relation schema R is in third normal from if,
R is in second normal from.
And whenever a nontrivial functional dependency X → A holds in R either
a) X is a superkey of R. or
b) A is a prime attribute of R.
Third Normal Form“if it is in second normal form and has no
transitive dependencies”
Figure 5-7 © 2000 Prentice Hall
Boyce-Codd Normal Form“if every determinant is a candidate key”
Figure 5-8 © 2000 Prentice Hall
Boyce-Codd Normal Form
A relation is in BCNF if and only if it is in 3NF and every determinant is a candidate key
A determinant is any attribute (simple or composite) on which some other attribute is fully functionally dependent
Situation:1. Multiple candidate keys2. Those candidate keys are composite3. The candidate keys are overlapped
Normal Forms
Boyce-Codd Normal Form
Strict than 3 NF
A relation schema R is in third normal from if, R is in second normal from.
And whenever a nontrivial functional dependency X → A holds in R, X is a superkey
of R.
Every relation schema in BCNF is in 3 NF as well.
Boyce-Codd Normal Form
(Student, Major, Advisor) (3NF)
or (Student, Advisor, Major) (1NF)Student may have more than one major with one advisor in each Advisor <<--> Major, Student <-->> Major, Student <-->>Advisor
(Student, Advisor) (BCNF)(Advisor, Major) (BCNF)
Example of Normalization
from
to
miles
flt#
airline
from
to
flt#
dateplane#
3NF &
BCNF:
3NF that is not BCNF
A
B C
Candidate keys: {A,B} and {A,C}
Determinants: {A,B} and {C}
A decomposition:
Lossless, but not dependency preserving!
A B C
R
C B
R1
A C
R2
Major Results in Normalization Theory
Theorem:
There is an algorithm for testing a decomposition for lossless join wrt. a set of FDs
Theorem:
There is an algorithm for testing a decomposition for dependency preservation
Theorem:
There is an algorithm for lossless join decomposition into BCNF
Theorem:
There is an algorithm for dependency preserving decomposition into 3NF
Normal Forms
Transitive Dependency
A FD X → Y in a relation schema R is a
transitive dependency if there is a set of attributes Z that is neither a candidate key nor a sub set of any key of R and both X → Z and Y → Z hold.
BCNF vs 3NF
BCNF: For every functional dependency X->Y in a set F of functional dependencies over relation R, either:
Y is a subset of X or,
X is a superkey of R
3NF: For every functional dependency X->Y in a set F of functional dependencies over relation R, either:
Y is a subset of X or,
X is a superkey of R, or
Y is a subset of K for some key K of R
N.b., no subset of a key is a key
3NF Schema
Account Client Office
A Joe 1
B Mary 1
A John 1
C Joe 2
For every functional
dependency X->Y in a set F
of functional dependencies
over relation R, either:
Y is a subset of X or,
X is a superkey of R, or
Y is a subset of K for some key K of R
Client, Office -> Client, Office, Account
Account -> Office
3NF Schema
Account Client Office
A Joe 1
B Mary 1
A John 1
C Joe 2
For every functional
dependency X->Y in a set F
of functional dependencies
over relation R, either:
Y is a subset of X or,
X is a superkey of R, or
Y is a subset of K for some key K of R
Client, Office -> Client, Office, Account
Account -> Office
BCNF vs 3NFFor every functional
dependency X->Y in a set F
of functional dependencies
over relation R, either:
Y is a subset of X or,
X is a superkey of R
Y is a subset of K for some key K of R
3NF has some redundancy
BCNF does not
Unfortunately, BCNF is not
dependency preserving, but 3NF is
Client, Office -> Client, Office, Account
Account -> Office
Account Client Office
A Joe 1
B Mary 1
A John 1
C Joe 2
Account Office
A 1
B 1
C 2
Account Client
A Joe
B Mary
A John
C Joe
Account -> Office
No non-trivial FDs
Lossless
decompositi
on
ClosureWant to find all attributes A such that X -> A is true, given a set of functional dependencies F
define closure of X as X*
Closure(X):
c = X
Repeat
old = c
if there is an FD Z->V such that
Z c and
V c then
c = c U V
until old = c
return c
BCNFifyClosure(X):
c = X
Repeat
old = c
if there is an FD Z->V such that
Z c and
V c then
c = c U V
until old = c
return c
BCNFify(schema R, functional dependency set F):
D = {{R,F}}
while there is a schema S with dependencies F' in D that is not in BCNF,
do:
given X->Y as a BCNF-violating FD in F
such that XY is in S
replace S in D with
S1={XY,F1} and
S2={(S-Y) U X, F2}
where F1 and F2 are the FDs in F over S1 or S2
(may need to split some FDs using decomposition)
End
return D
For every functional
dependency X->Y in a set F
of functional dependencies
over relation R, either:
Y is a subset of X or,
X is a superkey of R
Theory of Multi-valued Dependency
Let R be a relation scheme and let
R and R.
The multi-valued dependency
Example:
BC-Scheme =( loan-number, customer-name, street, customer-city)
Customer-name street, customer-city
Theory of Multi-valued Dependency
Inference rules for functional and Multi-valued dependencies are sound and complete.
Soundness means that the rules do not generate any dependencies that are not logically implies D (dependencies)
Completeness means that the rules allow us to generate all dependencies in D+ (all functional and Multi-valued dependencies)
Theory of Multi-valued Dependency
1. Reflexivity rule.
1. If is a set of attributes and , then
holds
2. Augmentation rule.
1. If holds and is a set of attributes, then holds
3. Transitivity rule.
1. If holds and holds, then holds
4. Complementation rule.
1. If holds, then R - - holds
Theory of Multi-valued Dependency
5. Multi-valued Augmentation rule.1. If holds and R and , then
holds
6. Multi-valued Transitivity rule.1. If holds and holds, then
- holds
7. Replication rule1. If holds, then .
8. Coalescence rule1. If holds and and there is a
such that R and = and , then holds.
Theory of Multi-valued Dependency
Example:
Let R= (A,B,C,G,H,I)
Suppose A BC holds the definition of MD implies that if t1[A] = t2[A] then there exists tuples t3 and t4such that:
t1[A] = t2[A] = t3[A] = t4[A]
t3[BC] = t1[BC]
t3[GHI] = t2[GHI]
t4[GHI] = t1[GHI]
t4[BC] = t2[BC]
The complementation rule states that if ABC , then A GHI.
Theory of Multi-valued Dependency
Some more Rules
If holds and holds, then holds
Multi-valued union rule
If holds and holds,. Then holds
Difference rule.
If holds and holds, then - holds and - holds.
Example
Lets apply the rules to the following example. Let R= (A,B,C,G,H,I) with the
following set of dependencies D given:
A B
B HI
CG H
List some members of D+
D+ :
A CGHI
A HI
B H
A CG
Fourth Normal Form
A relation is in fourth normal form if it is in BCNF and contains no multi-valued dependencies
Multi-valued Dependency There are three attributes (e.g. A,B,C) in a
relation.
For each value of A there is a well-defined set of value of B and a well-defined set of value of C.
The set of value of B is independent of the set of value of C, and vice versa.
Fourth Normal Form
(Course, Instructor, Textbook) (BCNF)One course is taught by several instructorsOne course uses the same set of textbooks by each instructor
(Course, Textbook) (4NF)(Course, Instructor) (4NF)
Fourth Normal Form“if in BCNF and has no multi-value
dependencies”
Figure 5-11 © 2000 Prentice Hall
Fifth Normal Form
?
Page 125
Fifth Normal Form
Every join dependency is a consequence of its relation keys
A non 5NF: Person-using-skills-on-jobs (Person, Skill, Job)
5 NF: Has-skill (Person, Skill)Need-skill (Skill, Job)Assigned-to-job (Person, Job)
Domain Key Normal Form
“if every constraint on the relation is a logical consequence of the definition of keys and domains”
Page 125
Constraint “a rule governing static values of attributes”
Key “unique identifier of a tuple”
Domain “description of an attribute‟s allowed values”
Example of non DK/NF
Enrollment (Student ID, Course ID, Grade)
Key constraint: Student ID + Course ID --> Grade
Domain constraint: Student ID: 7 digits, Course ID: 3 digits, Grade: A,B,C,D,F,P
General constraintIf Course ID < 900 then Grade in {A,B,C,D,F}else Grade in {P,F}
Since the general constraint cannot be inferred from key constraint or domain constraint, it is not a DK/NF.
B-tree InsertionINSERTION OF KEY ’K’
find the correct leaf node ’L’;
if ( ’L’ overflows ){
split ’L’, by pushing the middle key upstairs to parent node ’P’;
if (’P’ overflows){
repeat the split recursively;
}
else{
add the key ’K’ in node ’L’; /* maintaining the key order in ’L’ */
}
B-tree deletion - pseudocodeDELETION OF KEY ’K’
locate key ’K’, in node ’N’
if( ’N’ is a non-leaf node) {
delete ’K’ from ’N’;
find the immediately largest key ’K1’;
/* which is guaranteed to be on a leaf node ’L’ */
copy ’K1’ in the old position of ’K’;
invoke this DELETION routine on ’K1’ from the leaf node ’L’;
else { /* ’N’ is a leaf node */
if( ’N’ underflows ){
let ’N1’ be the sibling of ’N’;
if( ’N1’ is "rich"){ /* ie., N1 can lend us a key */
borrow a key from ’N1’ THROUGH the parent node;
}else{ /* N1 is 1 key away from underflowing */
MERGE: pull the key from the parent ’P’,
and merge it with the keys of ’N’ and ’N1’ into a new node;
if( ’P’ underflows){ repeat recursively }
}
}
Remarks on Normalization
The notions of dependency and normalization are semantic in nature
The normalization guidelines should be regarded primarily as a discipline to help the database design
Limitations of normalization
may not natural, e.g. zip code, area code for phone #
May ignore operational considerations: need not change, may change over time. e.g. (order# , prod# ,description, unit-price, quantity)
Difficult to enforce integrity control(Order#, Prod#, quantity)(Prod#, Description, Unit-price)Prod# may not be valid.
Now the integrity control is provided by relational DBMS
De-normalization
Normalization is only one of many database design goals.
Normalized (decomposed) tables require additional processing, reducing system speed.
Normalization purity is often difficult to sustain in the modern database environment. The conflict between design efficiency, information requirements, and processing speed are often resolved through compromises that include denormalization.
Integrity Constraints
Unit-2 (Part A)
Integrity Constraints
An important functionality of a DBMS is to enable the specification of integrity constraints and to enforce them.
Knowledge of integrity constraints is also useful for query
optimization.
Examples of constraints:keys, superkeysforeign keysdomain constraints, tuple constraints.Functional dependencies, multivalued dependencies.
Integrity Constraints
1. Integrity constraints provide a way of ensuring that changes made to the database by authorized users do not result in a loss of data consistency.
2. We saw a form of integrity constraint with E-R models:
key declarations: stipulation that certain attributes form a candidate key for the entity set.
form of a relationship: mapping cardinalities 1-1, 1-many and many-many.
3. An integrity constraint can be any arbitrary predicate applied to the database.
4. They may be costly to evaluate, so we will only consider integrity constraints that can be tested with minimal overhead.
Domain Constraints1. A domain of possible values should be associated
with every attribute. These domain constraints are the most basic form of integrity constraint. They are easy to test for when data is entered.
2. Domain types
1. Attributes may have the same domain, e.g. cname and employee-name.
2. It is not as clear whether bname and cname domains ought to be distinct.
3. At the implementation level, they are both character strings.
4. At the conceptual level, we do not expect customers to have the same names as branches, in general.
5. Strong typing of domains allows us to test for values inserted, and whether queries make sense. Newer systems, particularly object-oriented database systems, offer a rich set of domain types that can be extended easily.
Domain Types in SQL
The SQL standard supports a restricted set of domain types:
Fixed length character string, with user specified length
Fixed point number, with user specified precision
Integer
Small Integer
Floating point number
Floating point and double precision floating point numbers with machine dependent precision
Null Values
Insertion of incomplete tuples can introduce null values into the database.
SQL allows the domain declaration of an attribute to include the specification not null.
This prohibits the insertion of a null value for the attribute.
Referential Integrity
Often we wish to ensure that a value appearing in a relation for a given set of attributes also appears for another set of attributes in another relation. This is called referential integrity.
Basic Concepts
1. Dangling tuples.
Consider a pair of relations r(R) and s(S), and the natural join r . s
There may be a tuple tr in r that does not join with any tuple in s.
That is, there is no tuple ts in s such that tr[R S]= ts [R S]
We call this a dangling tuple.
Dangling tuples may or may not be acceptable.
Basic Concepts
1. Suppose there is a tuple in the accountrelation with the value “Lunartown”, but no matching tuple in the branch relation for the Lunartown branch.
2. This is undesirable, as should refer to a branch that exists.
3. Now suppose there is a tuple in the branch relation with “Mokan”, but no matching tuple in the account relation for the Mokan branch.
4. This means that a branch exists for which no accounts exist. This is possible, for example, when a branch is being opened. We want to allow this situation.
Basic Concepts
1. Note the distinction between these two situations: bname is the primary key of branch, while it is not for account. In account, bname is a foreign key, being the primary key of another relation.
Let r1(R1) and r2(R2) be two relations with primary keys K1 and K2 respectively.
We say that a subset of R2 is a foreign keyreferencing K1 in relation r1 if it is required that for every tuple t2 in r2 there must be a tuple t1 in r1
such that t1[K1]= t2[ ]
We call these requirements referential integrity constraints.
Also known as subset dependencies, as we require
Referential Integrity in the E-R Model
1. These constraints arise frequently. Every relation arising from a relationship set has referential integrity constraints.
Referential Integrity in the E-R Model
Figure shows an n-ary relationship set Rrelating entity sets .
Let K in the scheme for R is a foreign key that leads to a referential integrity constraint.
Relation schemes for weak entity sets must include the primary key of the strong entity set on which they are existence dependent. This is a foreign key, which leads to another referential integrity constraint.
Referential Integrity in SQL
1. An addition to the original standard allows specification of primary and candidate keys and foreign keys as part of the create table command:
primary key clause includes a list of attributes forming the primary key.
unique key clause includes a list of attributes forming a candidate key.
foreign key clause includes a list of attributes forming the foreign key, and the name of the relation referenced by the foreign key.
Examples
create table branch
(bname char(15) not null,
bcity char(30),
assets integer,
primary key (bname)
check (assets >= 0))
Contd..
create table account
(account# char(10) not null,
(bname char(15),
balance integer,
primary key (account#)
foreign key (bname) referencesbranch Check (balance >= 0))
Contd..
create table depositor
(cname char(20) not null,
account# char(10) not null,
primary key (cname, account#)
foreign key (cname) referencescustomer,
foreign key (account#) references account)
Structured Query Language(SQL)
Unit-II
Part- B
SQL
Data Definition
Basic Query Structure
Set Operations
Aggregate Functions
Null Values
Nested Sub queries
Complex Queries
Views
Modification of the Database
HistoryIBM Sequel language developed as part of
System R
project at the IBM San Jose Research Laboratory
Renamed Structured Query Language (SQL)
ANSI and ISO standard SQL: - SQL-86
- SQL-89
- SQL-92
- SQL:1999 (language name became Y2K compliant!)
- SQL:2003
Commercial systems offer most, if not all, SQL-92 features, plus varying feature sets from later standards and special proprietary features.
Data Definition Language
The schema for each relation.
The domain of values associated with each attribute.
Integrity constraints
The set of indices to be maintained for each relations.
Security and authorization information for each relation.
The physical storage structure of each relation on disk.
Allows the specification of not only a set of relations but also
information about each relation, including:
Basic Query Structure SQL is based on set and relational operations with certain modifications and enhancements
A typical SQL query has the form:
select A1, A2, ..., Anfrom r1, r2, ..., rmwhere P
Ai represents an attribute
Ri represents a relation
P is a predicate.
This query is equivalent to the relational algebra expression.
The result of an SQL query is a relation.
))((21,,, 21 mPAAA
rrrn
The select Clause
The select clause list the attributes desired in the result of a query
corresponds to the projection operation of the relational algebra
Example: find the names of all branches in the loan
relation:
select branch_name
from loan
In the relational algebra, the query would be:
branch_name (loan)
NOTE: SQL names are case insensitive (i.e., you may use upper- or lower-case letters.)
Some people use upper case wherever we use bold font.
The select Clause (Cont.)SQL allows duplicates in relations as well as in query results.
To force the elimination of duplicates, insert the keyword distinct after select.
Find the names of all branches in the loan relations, and remove duplicates
select distinct branch_namefrom loan
The keyword all specifies that duplicates not be removed.
select all branch_namefrom loan
The select Clause (Cont.)
An asterisk in the select clause denotes “all attributes”
select *from loan
The select clause can contain arithmetic expressions involving the operation, +, –, , and /, and operating on constants or attributes of tuples.
The query:
select loan_number, branch_name, amount 100
from loan
would return a relation that is the same as the loan relation, except that the value of the attribute amount is multiplied by 100.
The where ClauseThe where clause specifies conditions that the result must satisfy
Corresponds to the selection predicate of the relational algebra.
To find all loan number for loans made at the Perryridge branch with loan amounts greater than $1200.
select loan_numberfrom loanwhere branch_name = „ Perryridge‟ and
amount > 1200
Comparison results can be combined using the logical connectives and, or, and not.
Comparisons can be applied to results of arithmetic expressions.
The where Clause (Cont.)
SQL includes a between comparison operator
Example: Find the loan number of those loans with loan amounts between $90,000 and $100,000 (that is, $90,000 and $100,000)
select loan_number
from loan
where amount between 90000 and 100000
The from Clause
The from clause lists the relations involved in the query Corresponds to the Cartesian product operation of the
relational algebra.
Find the Cartesian product borrower X loan
select from borrower, loan
Find the name, loan number and loan amount of all customers
having a loan at the Perryridge branch.
select customer_name, borrower.loan_number, amount
from borrower, loan
where borrower.loan_number = loan.loan_number and
branch_name = ‘Perryridge’
The Rename Operation
The SQL allows renaming relations and attributes using the as clause:
old-name as new-name
Find the name, loan number and loan amount
of all customers; rename the column name
loan_number as loan_id.
select customer_name, borrower.loan_number as loan_id, amount
from borrower, loan
where borrower.loan_number = loan.loan_number
Tuple VariablesTuple variables are defined in the fromclause via the use of the as clause.
Find the customer names and their loan numbers for all customers having a loan at some branch.
select distinct T.branch_name
from branch as T, branch as S
where T.assets > S.assets and S.branch_city = ‘ Brooklyn’
Find the names of all branches that have greater assets than
some branch located in Brooklyn.
select customer_name, T.loan_number, S.amountfrom borrower as T, loan as Swhere T.loan_number = S.loan_number
String OperationsSQL includes a string-matching operator for comparisons on character strings. The operator “like” uses patterns that are described using two special characters:
percent (%). The % character matches any substring.
underscore (_). The _ character matches any character.
Find the names of all customers whose street includes the substring “Main”.
select customer_namefrom customerwhere customer_street like ‘%Main%’
Match the name “Main%”
like ‘Main\%’ escape ‘\’
SQL supports a variety of string operations such as
concatenation (using “||”)
converting from upper to lower case (and vice versa)
finding string length, extracting substrings, etc.
Ordering the Display of TuplesList in alphabetic order the names of all customers having a loan in Perryridge branch
select distinct customer_namefrom borrower, loanwhere borrower loan_number =
loan.loan_number andbranch_name = ‘Perryridge’
order by customer_name
We may specify desc for descending order or asc for ascending order, for each attribute; ascending order is the default. Example: order by customer_name desc
DuplicatesIn relations with duplicates, SQL can define how many copies of tuples appear in the result.
Multiset versions of some of the relational algebra operators – given multiset relations r1
and r2:
1. (r1): If there are c1 copies of tuple t1 in r1,
and t1 satisfies selections ,, then there are c1
copies of t1 in (r1).
2. A (r ): For each copy of tuple t1 in r1, there is a copy of tuple A (t1) in A (r1) where A (t1) denotes the projection of the single tuple t1.
3. r1 x r2 : If there are c1 copies of tuple t1 in r1 and c2 copies of tuple t2 in r2, there are c1 x c2 copies of the tuple t . t in r x r
Duplicates (Cont.)Example: Suppose multiset relations r1 (A, B) and r2 (C) are as follows:
r1 = {(1, a) (2,a)} r2 = {(2), (3), (3)}
Then B(r1) would be {(a), (a)}, while B(r1) x r2 would be
{(a,2), (a,2), (a,3), (a,3), (a,3), (a,3)}
SQL duplicate semantics:
select A1,, A2, ..., Anfrom r1, r2, ..., rmwhere P
is equivalent to the multiset version of the expression: ))((
21,,, 21 mPAAArrr
n
Set OperationsThe set operations union, intersect, and exceptoperate on relations and correspond to the relational algebra operations
Each of the above operations automatically eliminates duplicates; to retain all duplicates use the corresponding multiset versions union all, intersect all and except all.
Suppose a tuple occurs m times in r and n times in s, then, it occurs:
m + n times in r union all s
min(m,n) times in r intersect all s
max(0, m – n) times in r except all s
Set Operations
Find all customers who have a loan, an account, or both:
(select customer_name from depositor)except(select customer_name from borrower)
(select customer_name from depositor)
intersect
(select customer_name from borrower)
Find all customers who have an account but no loan.
(select customer_name from depositor)
union
(select customer_name from borrower)
Find all customers who have both a loan and an account.
Aggregate Functions
These functions operate on the multiset of values of a column of a relation, and return a value
avg: average valuemin: minimum valuemax: maximum valuesum: sum of valuescount: number of values
Aggregate Functions (Cont.)
Find the average account balance at the Perryridge branch.
Find the number of depositors in the bank.
Find the number of tuples in the customer relation.
select avg (balance)
from account
where branch_name = ‘Perryridge’
select count (*)
from customer
select count (distinct customer_name)
from depositor
Aggregate Functions – Group By
Find the number of depositors for each branch.
Note: Attributes in select clause outside of aggregate functions must
appear in group by list
select branch_name, count (distinct customer_name)
from depositor, account
where depositor.account_number = account.account_number
group by branch_name
Aggregate Functions – Having Clause
Find the names of all branches where the average account balance is more than $1,200.
Note: predicates in the having clause are applied after the
formation of groups whereas predicates in the where
clause are applied before forming groups
select branch_name, avg (balance)
from account
group by branch_name
having avg (balance) > 1200
Null ValuesIt is possible for tuples to have a null value, denoted by null, for some of their attributes
null signifies an unknown value or that a value does not exist.
The predicate is null can be used to check for null values.
Example: Find all loan number which appear in the loan relation with null values for amount.
select loan_numberfrom loanwhere amount is null
The result of any arithmetic expression involving nullis null
Example: 5 + null returns null
However, aggregate functions simply ignore nulls
More on next slide
Null Values and Three Valued Logic
Any comparison with null returns unknown
Example: 5 < null or null <> null or null = null
Three-valued logic using the truth value unknown:
OR: (unknown or true) = true, (unknown orfalse) = unknown
(unknown or unknown) = unknown
AND: (true and unknown) = unknown, (falseand unknown) = false,
(unknown and unknown) = unknown
NOT: (not unknown) = unknown
“P is unknown” evaluates to true if predicate Pevaluates to unknown
Result of where clause predicate is treated as false if it evaluates to unknown
Null Values and Aggregates
Total all loan amounts
select sum (amount )from loan
Above statement ignores null amounts
Result is null if there is no non-null amount
All aggregate operations except count(*) ignore tuples with null values on the aggregated attributes.
Nested Subqueries
SQL provides a mechanism for the nesting of subqueries.
A subquery is a select-from-whereexpression that is nested within another query.
A common use of subqueries is to perform tests for set membership, set comparisons, and set cardinality.
Example Query
Find all customers who have both an account and a loan at the bank.
Find all customers who have a loan at the bank but do not have
an account at the bank
select distinct customer_name
from borrower
where customer_name not in (select customer_name
from depositor )
select distinct customer_name
from borrower
where customer_name in (select customer_name
from depositor )
Example QueryFind all customers who have both an account and a loan at the Perryridge branch
Note: Above query can be written in a much simpler manner. The
formulation above is simply to illustrate SQL features.
select distinct customer_name
from borrower, loan
where borrower.loan_number = loan.loan_number and
branch_name = ‘Perryridge’ and
(branch_name, customer_name ) in
(select branch_name, customer_name
from depositor, account
where depositor.account_number =
account.account_number )
Set Comparison
Find all branches that have greater assets than some branch located in Brooklyn.
Same query using > some clause
select branch_name
from branch
where assets > some
(select assets
from branch
where branch_city = ‘Brooklyn’)
select distinct T.branch_name
from branch as T, branch as S
where T.assets > S.assets and
S.branch_city = ‘ Brooklyn’
Definition of Some Clause
F <comp> some r t r such that (F <comp> t )Where <comp> can be:
05
6
(5 < some ) = true
05
0
) = false
5
05(5 some ) = true (since 0 5)
(read: 5 < some tuple in the relation)
(5 > some
) = true(5 = some
(= some) in
However, ( some) not in
Example Query
Find the names of all branches that have greater assets than all branches located in Brooklyn.
select branch_name
from branch
where assets > all
(select assets
from branch
where branch_city = ‘Brooklyn’)
Definition of all Clause
F <comp> all r t r (F <comp> t)
05
6
(5 < all ) = false
610
4
) = true
5
46(5 all ) = true (since 5 4 and 5 6)
(5 < all
) = false(5 = all
( all) not in
However, (= all) in
Test for Empty Relations
The exists construct returns the value true if the argument subquery is nonempty.
exists r r Ø
not exists r r = Ø
Example QueryFind all customers who have an account at all branches located in Brooklyn.select distinct S.customer_name
from depositor as S
where not exists (
(select branch_name
from branch
where branch_city = ‘Brooklyn’)
except
(select R.branch_name
from depositor as T, account as R
where T.account_number = R.account_number and
S.customer_name = T.customer_name ))
Note that X – Y = Ø X Y
Note: Cannot write this query using = all and its variants
Test for Absence of Duplicate Tuples
The unique construct tests whether a subquery has any duplicate tuples in its result.
Find all customers who have at most one account at the Perryridge branch.
select T.customer_namefrom depositor as Twhere unique (
select R.customer_namefrom account, depositor as Rwhere T.customer_name = R.customer_name
andR.account_number =
account.account_number andaccount.branch_name = „ Perryridge‟ )
Example Query
Find all customers who have at least two accounts at the Perryridge branch.
select distinct T.customer_name
from depositor as T
where not unique (
select R.customer_name
from account, depositor as R
where T.customer_name = R.customer_name and
R.account_number = account.account_number and
account.branch_name = ‘Perryridge’)
Derived Relations
SQL allows a subquery expression to be used in the from clause
Find the average account balance of those branches where the average account balance is greater than $1200.
select branch_name, avg_balancefrom (select branch_name, avg (balance)
from accountgroup by branch_name )as branch_avg ( branch_name,
avg_balance )where avg_balance > 1200
Note that we do not need to use the having clause, since we compute the temporary (view) relation branch_avg in the from clause, and the attributes of branch_avg can be used directly in the where clause.
With ClauseThe with clause provides a way of defining a temporary view whose definition is available only to the query in which the with clause occurs.
Find all accounts with the maximum balance
with max_balance (value) asselect max (balance)from account
select account_numberfrom account, max_balancewhere account.balance =
max_balance.value
Complex Query using With Clause
Find all branches where the total account deposit is greater than the average of the total account deposits at all branches.
with branch_total (branch_name, value) as
select branch_name, sum (balance)
from account
group by branch_name
with branch_total_avg (value) as
select avg (value)
from branch_total
select branch_name
from branch_total, branch_total_avg
where branch_total.value >= branch_total_avg.value
ViewsIn some cases, it is not desirable for all users to see the entire logical model (that is, all the actual relations stored in the database.)
Consider a person who needs to know a customer‟s loan number but has no need to see the loan amount. This person should see a relation described, in SQL, by
(select customer_name, loan_numberfrom borrower, loanwhere borrower.loan_number =
loan.loan_number )
A view provides a mechanism to hide certain data from the view of certain users.
Any relation that is not of the conceptual model but is made visible to a user as a “virtual relation” is called a view.
View DefinitionA view is defined using the create view statement which has the form
create view v as < query expression >
where <query expression> is any legal SQL expression. The view name is represented by v.
Once a view is defined, the view name can be used to refer to the virtual relation that the view generates.
View definition is not the same as creating a new relation by evaluating the query expression Rather, a view definition causes the saving of an
expression; the expression is substituted into queries using the view.
Example QueriesA view consisting of branches and their customers
Find all customers of the Perryridge branch
create view all_customer as
(select branch_name, customer_name
from depositor, account
where depositor.account_number =
account.account_number )
union
(select branch_name, customer_name
from borrower, loan
where borrower.loan_number = loan.loan_number )
select customer_name
from all_customer
where branch_name = ‘Perryridge’
Views Defined Using Other Views
One view may be used in the expression defining another view
A view relation v1 is said to depend directlyon a view relation v2 if v2 is used in the
expression defining v1
A view relation v1 is said to depend on view
relation v2 if either v1 depends directly to v2
or there is a path of dependencies from v1
to v2
A view relation v is said to be recursive if it depends on itself.
View ExpansionA way to define the meaning of views defined in terms of other views.
Let view v1 be defined by an expression e1 that may itself contain uses of view relations.
View expansion of an expression repeats the following replacement step:
repeatFind any view relation vi in e1Replace the view relation vi by the expression
defining vi
until no more view relations are present in e1
As long as the view definitions are not recursive, this loop will terminate
Modification of the Database –Deletion
Delete all account tuples at the Perryridge branch
delete from accountwhere branch_name = ‘Perryridge’
Delete all accounts at every branch located in the city „Needham‟.
delete from accountwhere branch_name in (select branch_name
from branchwhere branch_city =
‘Needham’)
Example Query
Delete the record of all accounts with balances below the average at the bank.delete from account
where balance < (select avg (balance )
from account )
Problem: as we delete tuples from deposit, the average balance
changes
Solution used in SQL:
1. First, compute avg balance and find all tuples to delete
2. Next, delete all tuples found above (without recomputing avg or
retesting the tuples)
Modification of the Database –Insertion
Add a new tuple to account
insert into accountvalues („A-9732‟, „Perryridge‟,1200)
or equivalently
insert into account (branch_name, balance, account_number)
values („Perryridge‟, 1200, „A-9732‟)
Add a new tuple to account with balance set to null
insert into accountvalues („A-777‟,„Perryridge‟, null )
Modification of the Database –InsertionProvide as a gift for all loan customers of the Perryridge branch, a $200 savings account. Let the loan number serve as the account number for the new savings account
insert into accountselect loan_number, branch_name, 200from loanwhere branch_name = „Perryridge‟
insert into depositorselect customer_name, loan_numberfrom loan, borrowerwhere branch_name = „ Perryridge‟
and loan.account_number = borrower.account_number
The select from where statement is evaluated fully before any of its results are inserted into the relation (otherwise queries like
insert into table1 select * from table1would cause problems)
Modification of the Database –Updates
Increase all accounts with balances over $10,000 by 6%, all other accounts receive 5%. Write two update statements:
update accountset balance = balance 1.06where balance > 10000
update accountset balance = balance 1.05where balance 10000
The order is important
Can be done better using the case statement (next slide)
Case Statement for Conditional Updates
Same query as before: Increase all accounts with balances over $10,000 by 6%, all other accounts receive 5%.
update accountset balance = case
when balance <= 10000 then balance *1.05
else balance * 1.06end
Update of a ViewCreate a view of all loan data in the loanrelation, hiding the amount attribute
create view branch_loan asselect branch_name, loan_numberfrom loan
Add a new tuple to branch_loan
insert into branch_loanvalues („Perryridge‟, „L-307‟)
This insertion must be represented by the insertion of the tuple
(„L-307‟, „Perryridge‟, null )
into the loan relation
Updates Through Views (Cont.)
Some updates through views are impossible to translate into updates on the database relations create view v as
select branch_name from account
insert into v values (‘L-99‟, „ Downtown‟, „23‟)
Others cannot be translated uniquely insert into all_customer values („ Perryridge‟,
„John’) Have to choose loan or account, and
create a new loan/account number!
Most SQL implementations allow updates only on simple views (without aggregates) defined on a single relation
Joined RelationsJoin operations take two relations and return as a result another relation.
These additional operations are typically used as subquery expressions in the fromclause
Join condition – defines which tuples in the two relations match, and what attributes are present in the result of the join.
Join type – defines how tuples in each relation that do not match any tuple in the other relation (based on the join condition) are treated.
Joined Relations – Datasets for Examples
Relation loan
Relation borrower
Note: borrower information missing for L-260 and loan
information missing for L-155
Joined Relations – Examples
loan inner join borrower onloan.loan_number = borrower.loan_number
loan left outer join borrower on
loan.loan_number = borrower.loan_number
Joined Relations – Examples
loan natural inner join borrower
loan natural right outer join borrower
Joined Relations – Examples
loan full outer join borrower using (loan_number)
Find all customers who have either an account or a loan (but not both)
at the bank.
select customer_name
from (depositor natural full outer join borrower )
where account_number is null or loan_number is null
Database Schema
branch (branch_name, branch_city, assets)
customer (customer_name, customer_street,
customer_city)
loan (loan_number, branch_name, amount)
borrower (customer_name, loan_number)
account (account_number, branch_name,
balance)
depositor (customer_name, account_number)
Tuples inserted into loan and borrower
The loan and borrower relations