Normalization
-
Upload
manubhardwajcs -
Category
Documents
-
view
918 -
download
3
Transcript of Normalization
![Page 1: Normalization](https://reader031.fdocuments.net/reader031/viewer/2022020115/5467f155b4af9f55598b46b0/html5/thumbnails/1.jpg)
DBS201: Introduction to Normalization
![Page 2: Normalization](https://reader031.fdocuments.net/reader031/viewer/2022020115/5467f155b4af9f55598b46b0/html5/thumbnails/2.jpg)
Agenda
Top Down vs Bottom Up What is Normalization? Why Normalization? Normalization Steps
![Page 3: Normalization](https://reader031.fdocuments.net/reader031/viewer/2022020115/5467f155b4af9f55598b46b0/html5/thumbnails/3.jpg)
Top Down vs Bottom Up
Top Down Usually provided just a narrative or very high level
data requirements Need to discover entities, attributes, relationships Result is tables
![Page 4: Normalization](https://reader031.fdocuments.net/reader031/viewer/2022020115/5467f155b4af9f55598b46b0/html5/thumbnails/4.jpg)
Top Down vs Bottom Up
Bottom Up Provided with views of data Views can be screen shots or reports (printouts) Views contain fields (data) Need to groups fields together – find fields that are
in common Result is tables
![Page 5: Normalization](https://reader031.fdocuments.net/reader031/viewer/2022020115/5467f155b4af9f55598b46b0/html5/thumbnails/5.jpg)
Agenda
Top Down vs Bottom Up What is Normalization? Why Normalization? Normalization Steps
![Page 6: Normalization](https://reader031.fdocuments.net/reader031/viewer/2022020115/5467f155b4af9f55598b46b0/html5/thumbnails/6.jpg)
What is normalization?
Normalization Process for evaluating and correcting table
structures to minimize data redundancies helps eliminate data anomalies
Can be used in conjunction with ER modeling to produce a good database design
![Page 7: Normalization](https://reader031.fdocuments.net/reader031/viewer/2022020115/5467f155b4af9f55598b46b0/html5/thumbnails/7.jpg)
What is normalization?
Works through a series of stages called normal forms:
Normal form (1NF)Second normal form (2NF)Third normal form (3NF)
![Page 8: Normalization](https://reader031.fdocuments.net/reader031/viewer/2022020115/5467f155b4af9f55598b46b0/html5/thumbnails/8.jpg)
What is normalization?
2NF is better than 1NF 3NF is better than 2NF For most business database design purposes,
3NF is highest we need to go in the normalization process
Highest level of normalization is not always most desirable
![Page 9: Normalization](https://reader031.fdocuments.net/reader031/viewer/2022020115/5467f155b4af9f55598b46b0/html5/thumbnails/9.jpg)
Agenda
Top Down vs Bottom Up What is Normalization? Why Normalization? Normalization Steps
![Page 10: Normalization](https://reader031.fdocuments.net/reader031/viewer/2022020115/5467f155b4af9f55598b46b0/html5/thumbnails/10.jpg)
Why normalization?
Example: company that manages building projects Charges its clients by billing hours spent on each
contract Hourly billing rate is dependent on employee’s
position Periodically, a report is generated that contains
information displayed as in Table 5.1
![Page 11: Normalization](https://reader031.fdocuments.net/reader031/viewer/2022020115/5467f155b4af9f55598b46b0/html5/thumbnails/11.jpg)
A Sample Report Layout
![Page 12: Normalization](https://reader031.fdocuments.net/reader031/viewer/2022020115/5467f155b4af9f55598b46b0/html5/thumbnails/12.jpg)
A Table in the Report Format
![Page 13: Normalization](https://reader031.fdocuments.net/reader031/viewer/2022020115/5467f155b4af9f55598b46b0/html5/thumbnails/13.jpg)
Why normalization?
Structure of data set in Figure 5.1 does not handle data very well
The table structure appears to work; report is generated with ease
Unfortunately, the report may yield different results, depending on what data anomaly has occurred
![Page 14: Normalization](https://reader031.fdocuments.net/reader031/viewer/2022020115/5467f155b4af9f55598b46b0/html5/thumbnails/14.jpg)
Agenda
Top Down vs Bottom Up What is Normalization? Why Normalization? Normalization Steps
![Page 15: Normalization](https://reader031.fdocuments.net/reader031/viewer/2022020115/5467f155b4af9f55598b46b0/html5/thumbnails/15.jpg)
Conversion to First Normal Form
Relational table must not contain repeating groups
Repeating group Derives its name from the fact that a group of
multiple (related) entries can exist for any single key attribute occurrence
Normalizing the table structure will reduce these data redundancies
Normalization is three-step procedure
![Page 16: Normalization](https://reader031.fdocuments.net/reader031/viewer/2022020115/5467f155b4af9f55598b46b0/html5/thumbnails/16.jpg)
Step 1: Eliminate the Repeating Groups
Present data in a tabular format, where each cell has a single value and there are no repeating groups
Eliminate repeating groups by eliminating nulls, making sure that each repeating group attribute contains an appropriate data value
![Page 17: Normalization](https://reader031.fdocuments.net/reader031/viewer/2022020115/5467f155b4af9f55598b46b0/html5/thumbnails/17.jpg)
First Normal Form
![Page 18: Normalization](https://reader031.fdocuments.net/reader031/viewer/2022020115/5467f155b4af9f55598b46b0/html5/thumbnails/18.jpg)
Step 2: Identify the Primary Key
Primary key must uniquely identify attribute values (a row)
Primary key is PROJ_NUM, EMP_NUM (because the combination of those two uniquely identifies each row of the table)
![Page 19: Normalization](https://reader031.fdocuments.net/reader031/viewer/2022020115/5467f155b4af9f55598b46b0/html5/thumbnails/19.jpg)
Step 3: Identify all Dependencies Dependencies can be depicted with the help
of a diagram Dependency diagram:
Depicts all dependencies found within a given table structure
Helpful in getting bird’s-eye view of all relationships among a table’s attributes
Use makes it much less likely that an important dependency will be overlooked
![Page 20: Normalization](https://reader031.fdocuments.net/reader031/viewer/2022020115/5467f155b4af9f55598b46b0/html5/thumbnails/20.jpg)
Dependency Diagram
PROJ_NUM
PROJ_NAME
EMP_NUM
EMP_NAME
JOB_CLASS
CHG_HOUR
HOURS
1NF
![Page 21: Normalization](https://reader031.fdocuments.net/reader031/viewer/2022020115/5467f155b4af9f55598b46b0/html5/thumbnails/21.jpg)
First Normal Form
Tabular format in which: All key attributes are defined There are no repeating groups in the table All attributes are dependent on primary key
All relational tables satisfy 1NF requirements Some tables contain partial dependencies
Dependencies based on only part of the primary key
Still subject to data redundancies
EMPLOYEE_PROJECT(PROJ_NUM(PK), EMP_NUM(PK), PROJ_NAME, EMP_NAME, JOB_CLASS, CHG_HOUR, HOURS)
![Page 22: Normalization](https://reader031.fdocuments.net/reader031/viewer/2022020115/5467f155b4af9f55598b46b0/html5/thumbnails/22.jpg)
Conversion to Second Normal Form
Relational database design can be improved by converting the database into second normal form (2NF)
Two steps
![Page 23: Normalization](https://reader031.fdocuments.net/reader031/viewer/2022020115/5467f155b4af9f55598b46b0/html5/thumbnails/23.jpg)
Step 1: Identify All Key Components
Determine which attributes are dependent on which other attributes
Using the dependency diagram, document the partial dependencies: in other words take each part of the primary key and document which attributes are dependent on each part of the primary key
![Page 24: Normalization](https://reader031.fdocuments.net/reader031/viewer/2022020115/5467f155b4af9f55598b46b0/html5/thumbnails/24.jpg)
Dependency Diagram
PROJ_NUM
PROJ_NAME
EMP_NUM
EMP_NAME
JOB_CLASS
CHG_HOUR
HOURS
1NF 2NF
![Page 25: Normalization](https://reader031.fdocuments.net/reader031/viewer/2022020115/5467f155b4af9f55598b46b0/html5/thumbnails/25.jpg)
Step 2: Identify the Dependent Attributes
Write each key component on separate line, and then write the original (composite) key on the last line
Each component will become the key in a new table
PROJECT (PROJ_NUM (PK), PROJ_NAME)
EMPLOYEE (EMP_NUM(PK), EMP_NAME, JOB_CLASS, CHG_HOUR)
EMPLOYEE_PROJECT(PROJ_NUM(PK), EMP_NUM(PK), HOURS)
![Page 26: Normalization](https://reader031.fdocuments.net/reader031/viewer/2022020115/5467f155b4af9f55598b46b0/html5/thumbnails/26.jpg)
Second Normal Form
Table is in second normal form (2NF) if: It is in 1NF and It includes no partial dependencies:
No attribute is dependent on only a portion of the primary key
![Page 27: Normalization](https://reader031.fdocuments.net/reader031/viewer/2022020115/5467f155b4af9f55598b46b0/html5/thumbnails/27.jpg)
Conversion to Third Normal Form
Data anomalies created are easily eliminated by completing these steps
![Page 28: Normalization](https://reader031.fdocuments.net/reader031/viewer/2022020115/5467f155b4af9f55598b46b0/html5/thumbnails/28.jpg)
Step 1: Identify Each New Determinant
For every transitive dependency, write its determinant as a PK for a new table Determinant
Any attribute whose value determines other values within a row
Using the dependency diagram, document the transitive dependencies: in other words identify the attributes dependent on each determinant identified above and identify the dependency
![Page 29: Normalization](https://reader031.fdocuments.net/reader031/viewer/2022020115/5467f155b4af9f55598b46b0/html5/thumbnails/29.jpg)
Dependency Diagram
PROJ_NUM
PROJ_NAME
EMP_NUM
EMP_NAME
JOB_CLASS
CHG_HOUR
HOURS
1NF 2NF 3 NF
JOB_CLASS is a determinant because it can determine other values within the row. In this case, it’s the CHG_HOUR
![Page 30: Normalization](https://reader031.fdocuments.net/reader031/viewer/2022020115/5467f155b4af9f55598b46b0/html5/thumbnails/30.jpg)
Step 2: Name the table
Name the table to reflect its contents and function
PROJECT (PROJ_NUM (PK), PROJ_NAME)
EMPLOYEE (EMP_NUM(PK), EMP_NAME)
JOB (JOB_CLASS(PK), CHG_HOUR)
EMPLOYEE_PROJECT(PROJ_NUM(PK), EMP_NUM(PK), HOURS)
![Page 31: Normalization](https://reader031.fdocuments.net/reader031/viewer/2022020115/5467f155b4af9f55598b46b0/html5/thumbnails/31.jpg)
Third Normal Form
A table is in third normal form (3NF) if: It is in 2NF and It contains no transitive dependencies
![Page 32: Normalization](https://reader031.fdocuments.net/reader031/viewer/2022020115/5467f155b4af9f55598b46b0/html5/thumbnails/32.jpg)
Improving the Design
Table structures are cleaned up to eliminate the troublesome initial partial and transitive dependencies
Normalization cannot, by itself, be relied on to make good designs
It is valuable because its use helps eliminate data redundancies
![Page 33: Normalization](https://reader031.fdocuments.net/reader031/viewer/2022020115/5467f155b4af9f55598b46b0/html5/thumbnails/33.jpg)
Improving the Design (continued)
The following changes should be made: PK assignment Naming conventions Attribute atomicity Adding attributes Adding relationships (will define the FKs) Refining PKs Eliminate derived attributes
![Page 34: Normalization](https://reader031.fdocuments.net/reader031/viewer/2022020115/5467f155b4af9f55598b46b0/html5/thumbnails/34.jpg)
Business Rules
Business Rules drive the relationships Look at the data in the table and
understand/interpret what the relationship is between the data
![Page 35: Normalization](https://reader031.fdocuments.net/reader031/viewer/2022020115/5467f155b4af9f55598b46b0/html5/thumbnails/35.jpg)
Business Rules
A job class can have more than 1 employee in it; results in a 1:M relationships between JOB and EMPLOYEE
Add the FKs into the appropriate tables
![Page 36: Normalization](https://reader031.fdocuments.net/reader031/viewer/2022020115/5467f155b4af9f55598b46b0/html5/thumbnails/36.jpg)
Step 2: Name the table
Name the table to reflect its contents and function
PROJECT (PROJ_NUM (PK), PROJ_NAME)
EMPLOYEE (EMP_NUM(PK), EMP_NAME, JOB_CLASS(FK))
JOB (JOB_CLASS(PK), CHG_HOUR)
EMPLOYEE_PROJECT(PROJ_NUM(PK), EMP_NUM(PK), HOURS)
New foreign key
Business rule not necessary between EMPLOYEE and PROJECT. Why?
![Page 37: Normalization](https://reader031.fdocuments.net/reader031/viewer/2022020115/5467f155b4af9f55598b46b0/html5/thumbnails/37.jpg)
First Normal Form
![Page 38: Normalization](https://reader031.fdocuments.net/reader031/viewer/2022020115/5467f155b4af9f55598b46b0/html5/thumbnails/38.jpg)
Normalization and Database Design
Normalization should be part of design process
Make sure that proposed entities meet required normal form before table structures are created
ER diagram Provides the big picture, or macro view, of an
organization’s data requirements and operations Created through an iterative process
Identifying relevant entities, their attributes and their relationship
Use results to identify additional entities and attributes
![Page 39: Normalization](https://reader031.fdocuments.net/reader031/viewer/2022020115/5467f155b4af9f55598b46b0/html5/thumbnails/39.jpg)
ERD
EMPLOYEE
PK EMP_NUM
EMP_LNAME EMP_FNAME EMP_INITIALFK1 JOB_CLASS
PROJECT
PK PROJ_NUM
PROJ_NAME
JOB_CLASS
PK JOB_CLASS
CHG_HOURS
EMPLOYEE_PROJECT
PK,FK1 EMP_NUMPK,FK2 PROJ_NUM
HOURS
works
Has working
Has assigned
![Page 40: Normalization](https://reader031.fdocuments.net/reader031/viewer/2022020115/5467f155b4af9f55598b46b0/html5/thumbnails/40.jpg)
Normalization and Database Design (continued)
Normalization procedures Focus on the characteristics of specific entities A micro view of the entities within the ER diagram
Difficult to separate normalization process from ER modeling process
Two techniques should be used concurrently
![Page 41: Normalization](https://reader031.fdocuments.net/reader031/viewer/2022020115/5467f155b4af9f55598b46b0/html5/thumbnails/41.jpg)
Summary
Normalization is a table design technique aimed at minimizing data redundancies
First three normal forms (1NF, 2NF, and 3NF) are most commonly encountered
Normalization is an important part—but only a part—of the design process
Continue the iterative ER process until all entities and their attributes are defined and all equivalent tables are in 3NF
![Page 42: Normalization](https://reader031.fdocuments.net/reader031/viewer/2022020115/5467f155b4af9f55598b46b0/html5/thumbnails/42.jpg)
Normalization Exercise
Normalize the above table. 1NF – eliminate repeating groups, identify a PK for the table structure 2NF – find partial dependencies 3NF - find transitive dependenciesWrite final table structures, including all relationships. Make all attributes atomic.
STU_NUM STU_LNAME STU_MAJOR DEPT_CODE DEPT_NAME DEPT_PHONE
211343 Stephanos Accounting ACCT Accounting 4356
200128 Smth Accounting ACCT Accounting 4356
199876 Jones Marketing MKTG Marketing 4378
199876 Ortiz Marketing MKTG Marketing 4378
223456 McKulski Statistics MATH Mathematics 3420
![Page 43: Normalization](https://reader031.fdocuments.net/reader031/viewer/2022020115/5467f155b4af9f55598b46b0/html5/thumbnails/43.jpg)
Normalization Exercise
Vehicle Num Model Year
Acme TireNumber
Tire Manfctr Size
KM This Vehicle
11 Ford 1997 1327 Goodyear 15R7 43000
1328 Goodyear 15R7 43000
1329 Goodyear 15R7 25000
1330 Goodyear 15R7 24000
15 Chevrolet 1999 2013 BF Goodrich 15R7 18000
2014 BF Goodrich 15R7 18000
2013 BF Goodrich 15R7 29000
2014 BF Goodrich 15R7 29000
Normalize the above table. 1NF – eliminate repeating groups, identify a PK for the table structure 2NF – find partial dependencies 3NF - find transitive dependenciesWrite final table structures, including all relationships. Make all attributes atomic.