Week07 - Normalization
-
Upload
muhammad-asghar-khan -
Category
Documents
-
view
225 -
download
0
Transcript of Week07 - Normalization
8/8/2019 Week07 - Normalization
http://slidepdf.com/reader/full/week07-normalization 1/37
1
Database I
Methodology
Normalization
8/8/2019 Week07 - Normalization
http://slidepdf.com/reader/full/week07-normalization 2/37
2
Normalization ? (1/2)
Normalization is the technique for analyzing
relations based on their primary key (or candidate
keys in the case of BCNF) and functional
dependencies
Normalization is executed as a series of steps
Each step corresponds to a specific normal form
Different normal forms or levels are called as first,
second and so on
Each normalized form has certain requirements or
conditions, which must be fulfilled
8/8/2019 Week07 - Normalization
http://slidepdf.com/reader/full/week07-normalization 3/37
3
Normalization ? (2/2)
If a relation fulfills any particular form then it
is said to be in that normal form
The minimum form in which all the tables arein is called the normal form of entire database
It is performed after the logical database
design
8/8/2019 Week07 - Normalization
http://slidepdf.com/reader/full/week07-normalization 4/37
4
Why Normalization ? (1/3)
Main aim of relational database design is to group
attributes into relations so as to minimize data
redundancy and thereby reduce the file storage
space
Consider StaffBranch relation below
It is clear that there is redundant information in
StaffBranch relation
SNo SName Position Salary BNo BAddress PhNo
S123 Asad Manager 23000 B5 Peshawar 20456789
S125 Jamal Programmer 40000 B5 Peshawar 20456789
S130 Ghamgeen Peon 5000 B6 Islamabad 34564567
S133 Khamar Sweaper 3000 B7 Mardan 23567890
8/8/2019 Week07 - Normalization
http://slidepdf.com/reader/full/week07-normalization 5/37
5
Why Normalization ? (2/3)
The serious problem with relations havingredundant information is update anomalies
Update anomalies can be classified as: Insertion Anomalies
There are two insertion anomalies in StaffBranchrelation
To insert the details of new member, we have to also
enter the branch details for new entry, thus we have totake great care that branch details is consistent
To insert details of a new branch that currently has nomembers of staff, the solution could be to insert nullsfor staff, but wait SNo is primary key ?
8/8/2019 Week07 - Normalization
http://slidepdf.com/reader/full/week07-normalization 6/37
6
Why Normalization ? (3/3)
Deletion Anomalies
If we delete the row for staff number S133, the
information relating to branch number B7 is also lost
from the database
Modification Anomalies
If we want to change the phone number for branch
number B5, we must update the rows of all staff
located at that branch to avoid data inconsistency
8/8/2019 Week07 - Normalization
http://slidepdf.com/reader/full/week07-normalization 7/37
7
Functional Dependencies (1/4)
Functional dependency describes the
relationship between attributes
In a database, we often have the case whereone field defines the other
For example, we can say that Employee ID
(EmpID) defines a name
What does this mean?
It means that if I have a database with
EmpIDs and names, and if I know someone's
EmpID, then I can find their name
8/8/2019 Week07 - Normalization
http://slidepdf.com/reader/full/week07-normalization 8/37
8
Functional Dependencies (2/4)
From the word "defines," we means that for
every EmpID we will have one and only one
name So we will say that we have defined name as
being functionally dependent on EmpID
A formal definition can be:
If A and B are attributes or sets of attributes of
relation R, we say that B is functionally dependent
on A if each value of A in R has associated with it
exactly one value of B in R
8/8/2019 Week07 - Normalization
http://slidepdf.com/reader/full/week07-normalization 9/37
9
Functional Dependencies (3/4)
We write a functional dependency (FD) as:
EmpID Name
Can be read as ³EmpI
D defines Name" or ³EmpI
Dimplies Name³ or Name is functionally dependent on
EmpID´
Let us look at a an interesting example:
EmpID Name Job Salary101 Darya Khan Programmer 40000
104 Shahrukh Designer 50000
103 Gul Jana System Engineer 35000
103 Gul Jana Project Manager 55000
8/8/2019 Week07 - Normalization
http://slidepdf.com/reader/full/week07-normalization 10/37
10
Functional Dependencies (4/4)
We have the FD that EmpID Name
This means that every time we find 103, wefind the name, Gul Jana
Thus something is on the left-hand side of aFD, it does not imply that you have a key or that it will be unique in the database
The FD X Y only means that for everyoccurrence of X you will get the same valueof Y
8/8/2019 Week07 - Normalization
http://slidepdf.com/reader/full/week07-normalization 11/37
11
FD Inference Axioms or Armstrong
Axioms ( Transitivity Rule )
Let us now consider
another example
Here, we will define:
RNo Name
RNo School
Name School
Have we violated any FDs with our data?
Because all RNos are unique, there cannot be a FD
violation of RNo Name
The same comment is true for RNo School and
Name School
RNo Name School Location
101 Rasheed PPS Peshawar
102 Ani FCA Mardan
103 Naseer ECS Swabi
104 Ghafoor FCA Mardan
105 Bilal PPS Peshawar
106 Sohail PPS Peshawar
8/8/2019 Week07 - Normalization
http://slidepdf.com/reader/full/week07-normalization 12/37
12
FD Inference Axioms or Armstrong
Axioms ( Transitivity Rule )
If we define a FD X Y and we define a FD Y Z,
then we know by inference that X Z
So we can infer that RNo Location
The inference illustrated is called the transitivity rule
of FD inference
To see that the FD RNo Location is true in our
data, you can note that given any value of RNo, you
always find a unique location for that person
This prove that transitivity rule exists
8/8/2019 Week07 - Normalization
http://slidepdf.com/reader/full/week07-normalization 13/37
13
FD Inference Axioms or Armstrong
Axioms (Pseudo Transitivity Rule )
If A B and CB D, then AC D
If RNo Name and
Name,School Location Then we can write it as
RNo,Name Location
8/8/2019 Week07 - Normalization
http://slidepdf.com/reader/full/week07-normalization 14/37
14
FD Inference Axioms or Armstrong
Axioms ( Reflexive Rule )
If X is a composite, composed of A
and B, then X A and X B
Example: X= Name, City. Then we
are saying that X Name and
X City
The rule, says if I give you the combination
<Rasheed, Mardan>, what is this person's Name?What is this person's City?
Name City
Haleem Peshawar
Rasheed Mardan
Kaleem Charsadda
8/8/2019 Week07 - Normalization
http://slidepdf.com/reader/full/week07-normalization 15/37
15
FD Inference Axioms or Armstrong
Axioms (Augmentation Rule)
If X Y, then XZ Y
Now, I claim that because
Name City, that
Name+ShoeSize City
(i.e., we augmented Name with ShoeSize)
Will there be a contradiction here, ever?
No, because we defined Name City, Nameplus more information will always identify theunique City for that individual
We can always add information to the LHS of an FD and still have the FD be true
Name City ShoeSize
Haleem Peshawar 10
Rasheed Mardan 6
Kaleem Charsadda 7
8/8/2019 Week07 - Normalization
http://slidepdf.com/reader/full/week07-normalization 16/37
16
FD Inference Axioms or Armstrong
Axioms (Decomposition Rule)
The decomposition/projectivity rule says that
if it is given that X YZ (that is, X defines
both Y and Z), then X Y and X Z Suppose I define Name City, ShoeSize
This means for every occurrence of Name, I
have a unique value of City and a unique
value of ShoeSize
8/8/2019 Week07 - Normalization
http://slidepdf.com/reader/full/week07-normalization 17/37
17
FD Inference Axioms or Armstrong
Axioms (Union Rule)
The union/additivity rule is the reverse of the
decomposition rule in that if X Y and
X Z, then X YZ If we were given that Name City and given
that Name ShowSize
We can immediately write Name City, Shoe
Size
8/8/2019 Week07 - Normalization
http://slidepdf.com/reader/full/week07-normalization 18/37
18
Keys and Functional
Dependencies (1/4)
The main reason we identify the FDs and
inference rules is to be able to find keys and
develop normal forms for relationaldatabases
In any table, we want to find out which, if any
attribute(s), will identify the rest of the
attributes An attribute that will identify all the other
attributes in row is called a "candidate key³
Consider the example on next slide
8/8/2019 Week07 - Normalization
http://slidepdf.com/reader/full/week07-normalization 19/37
19
Keys and Functional
Dependencies (2/4)
RNo Name School Location
101 Rasheed PPS Peshawar
102 Ani FCA Mardan
103
Naseer ECS Swabi104 Ghafoor FCA Mardan
105 Bilal PPS Peshawar
106 Sohail PPS Peshawar
Now suppose we define
the following FDs:
RNo NameRNo School
School Location
What we want is to find the least number of attributes that can identify all the rest
attributes (hopefully only one attribute)
Suppose we take RNo as candidate key
8/8/2019 Week07 - Normalization
http://slidepdf.com/reader/full/week07-normalization 20/37
20
Keys and Functional
Dependencies (3/4)
Can we show that RNo "defines" all attributes in the
relation?
We can use the transitive, reflexive and union rule
RNo Name (given)
RNo School (given)
School Location (given)
RNo Location (derived by the transitive rule)
RNo RNo (reflexive rule)
RNo RNo, Name, School, Location (union rule)
Therefore RNo is a candidate key for this relation
Finding a candidate key is the finding of a ³closure of an
attribute or a set of attributes´ that defines all the other
attributes
8/8/2019 Week07 - Normalization
http://slidepdf.com/reader/full/week07-normalization 21/37
21
Keys and Functional
Dependencies (4/4)
Are there any other candidate keys?
Of course! By augmentation rule we can augment
RNo and name to form new candidate keys: RNo,
Name
Is School a candidate key? No
Once we have found a set of candidate keys (or
perhaps only one as in this case), we designate one
of the candidate keys as the primary key and move
on to normal forms
FD rules are useful in developing Normal forms
8/8/2019 Week07 - Normalization
http://slidepdf.com/reader/full/week07-normalization 22/37
22
Normal Forms
(1N Form)
A relation is in first normal form if and only if
every attribute is single valued for each tuple or
There be no repeating fields or groups in the row
This means that each attribute in each row , or
each cell of the table, contains only one value
First normal form say that the domains of
attributes of a relation are atomic The concept of 1NF is demonstrated on next
slide
8/8/2019 Week07 - Normalization
http://slidepdf.com/reader/full/week07-normalization 23/37
23
Normal Forms
(1N Form)
Name Address Dependant
Sohail Dar I-8 Islamabad Jamal,Kamal,Karim
Shoiab Ali Wah Cantt. Haider, Ihsan
TahiraI jaz Peshawar
Gul
Mast
There are more than one values for
Dependant, thus it is not in 1N form
Name Address
Sohail Dar I-8 Islamabad
Shoiab Ali Wah Cantt.
Tahira I jaz Peshawar
DependantName EmployeeName
Jamal Sohail Dar
Kamal Sohail Dar
Karim Sohail Dar
Haider Shoiab Ali
Ihsan Shoiab Ali
Gul Mast Tahira I jaz
Dependant
Employee
Employee
1NF Tables:
Non-1NF:
8/8/2019 Week07 - Normalization
http://slidepdf.com/reader/full/week07-normalization 24/37
24
Normal Forms
(1N Form)
Note that the original table could be
reconstructed by combining these two tables
By recording all the rows in the EMPLOYEEtable and combining them with the
corresponding rows in the EMPLOYEE table
where the names were equal (an equi-join
operation)
8/8/2019 Week07 - Normalization
http://slidepdf.com/reader/full/week07-normalization 25/37
25
Normal Forms
(2N Form)
A relation is in second normal form (2NF) if and only
if it is in first normal form and all the non-key
attributes are fully functional dependent on the key
Partial dependencies are not allowed
The only time we have to be concerned about 2NF
is when the key is composite
Second normal form (2NF) addresses the concept of
removing duplicative data
A relation that is not in 2NF exhibits the update,
insertion and deletion anomalies
Consider the example on next slide
8/8/2019 Week07 - Normalization
http://slidepdf.com/reader/full/week07-normalization 26/37
26
Normal Forms
(2N Form)
Employee (name, job,
salary, address)
FDs
name+job salary
name address
The problems developing here are redundancy and
anomalies Here address depends only on the name, not the
job; this is an example of a partial dependency
name job salary address
Asad Welder 2300 Peshawar
Asad Programmer 30000 Peshawar
Karim Programmer 45000 Lahore
Javad Designer 40000 Lahore
8/8/2019 Week07 - Normalization
http://slidepdf.com/reader/full/week07-normalization 27/37
27
Normal Forms
(2N Form)
As the table is in 1NF
The process for transforming a 1NF table to 2NF is:1. Identify any partial determinants (attribute on LHS) other
than the composite key, and the columns they determine2. Create and name a new table for each determinant and
the unique columns it determines
3. Move the determined columns from the original table tothe new table. The determinate becomes the primary key
of the new table4. Delete the columns you just moved from the original table
except for the determinant which will serve as a foreignkey
5. The original table may be renamed to maintain semantic
meaning
8/8/2019 Week07 - Normalization
http://slidepdf.com/reader/full/week07-normalization 28/37
28
Normal Forms
(2N Form)
Consider the non-2NF
relation
name job salary
Asad Welder 2300
Asad Programmer 30000
Karim Programmer 45000
Javad Designer 40000
name address
Asad Peshawar
Karim Lahore
Javad Lahore
name job salary address
Asad Welder 2300 Peshawar
Asad Programmer 30000 Peshawar
Karim Programmer 45000 LahoreJavad Designer 40000 Lahore
1
2
NameAdd3
4
NameJob
5
8/8/2019 Week07 - Normalization
http://slidepdf.com/reader/full/week07-normalization 29/37
29
Normal Forms
(3N Form)
A relational table is in 3NF if it is already in2NF and every non-key column is non-transitively dependent upon its primary key
OR All non-key attributes are functionallydependent only upon the primary key
Transitive Dependency
Transitive dependency occurs when one non-keyattribute determines another non-key attribute
E.g.
STD(stId, stName, stAdr, prName, prCrdts)
8/8/2019 Week07 - Normalization
http://slidepdf.com/reader/full/week07-normalization 30/37
30
Normal Forms
(3N Form)
stId stName, stAdr, prName, prCrdts
prName prCrdts
Here stId is the key prCrdts can be determined by prName
(Transitive Dependency)
Thus transitive dependency exists whichmeans STUDENT relation is not in 3NF
Transitive dependencies cause insertion,
deletion, and update anomalies
8/8/2019 Week07 - Normalization
http://slidepdf.com/reader/full/week07-normalization 31/37
31
Normal Forms
(3N Form)
So for 3NF we will concentrate on relations withone candidate key
The process for transforming a 2NF table to 3NF is:1. Identify any determinants, other the primary key, and the
columns they determine2. Create and name a new table for each determinant and
the unique columns it determines
3. Move the determined columns from the original table tothe new table. The determinate becomes the primary key
of the new table4. Delete the columns you just moved from the original table
except for the determinate which will serve as a foreignkey
5. The original table may be renamed to maintain semanticmeaning
8/8/2019 Week07 - Normalization
http://slidepdf.com/reader/full/week07-normalization 32/37
32
Normal Forms
(3N Form)
stdId stName stAdr prName prCrdts
S1020 Sohail Dar I-8 Islamabad MCS 64
S1038 Shoaib Ali G-6 Islamabad BCS 132
S1015 Tahira Ejaz L Rukh Wah MCS 64
prName prCrdts
MCS 64
BCS 132
stdId stName stAdr prName
S1020 Sohail Dar I-8 Islamabad MCS
S1038 Shoaib Ali G-6 Islamabad BCS
S1015 Tahira Ejaz L Rukh Wah MCS
1
Program
2
3 4
Student
5
Non-3NF
8/8/2019 Week07 - Normalization
http://slidepdf.com/reader/full/week07-normalization 33/37
33
Normal Forms
(Boyce-Codd Normal Form (BCNF))
1NF and 2NF identifies partial and transitive
dependencies on the basis of primary keys
But what if such dependencies remain on other
candidate keys, if any exist
BCNF identifies the functional dependencies on all
candidate keys
It means that if relation has one candidate key and it
is in 3NF, it is also in BCNF
3NF relation is not in BCNF if:
The candidate keys in the relation are composite keys
There is more than one candidate key in the relation
8/8/2019 Week07 - Normalization
http://slidepdf.com/reader/full/week07-normalization 34/37
34
Normal Forms
(Boyce-Codd Normal Form (BCNF))
The keys are not disjoint, that is, some attributes in thekeys are common
For example consider the Enroll relation
Enroll (sno, sname, cno, cname, dateofenroll) Let us assume that the relation has the following
candidate keys:sno,cno
sno,cname
sname,cno
sname, cname
The relation is in 3NF but not in BCNF becausethere are dependencies
8/8/2019 Week07 - Normalization
http://slidepdf.com/reader/full/week07-normalization 35/37
35
Normal Forms
(Boyce-Codd Normal Form (BCNF))
sno sname
cno cname
Where attributes are part of a candidate key are
dependent on part of another candidate key By decomposing the Enroll relation in the following
three relations result in BCNF
Std(sno, sname)
Crs(cno, cname)
Std_Crs(sno, cno, dateofenroll)
8/8/2019 Week07 - Normalization
http://slidepdf.com/reader/full/week07-normalization 36/37
36
Normal Forms
(Boyce-Codd Normal Form (BCNF))
sno sname cno cname dateofenroll
S1020 Sohail Dar C104 E-Com 12/02/2007
S1038 Shoaib Ali C104 E-Com 10/01/2008
S1015 Tahira Ejaz H2
34English
01/
02/2
007
sno sname
S1020 Sohail Dar
S1038 Shoaib Ali
S1015 Tahira Ejaz
cno cname
C104 E-Com
H234 English
sno cname datofenroll
S1020 C104 10/01/2008
S1038 C104 01/02/2007
S1015 H234 12/02/2007
Non-BCNF
Std
Crs
Std_Crs
8/8/2019 Week07 - Normalization
http://slidepdf.com/reader/full/week07-normalization 37/37
37
Other Normal Forms
5NF deals with multi-valued dependency and
possible loss less decompositions
DomainK
ey Normal Form (DK
NF) reducesfurther chances of any possible inconsistency