Normalization of Databases

27
Normalization of Normalization of Databases Databases Farrokh Alemi Ph.D. Farrokh Alemi Ph.D. Francesco Loaiza Ph.D. J.D. Francesco Loaiza Ph.D. J.D. Vikas Arya Vikas Arya Updated by Janusz Wojtusiak, Updated by Janusz Wojtusiak, Fall 2009 Fall 2009

description

Normalization of Databases. Farrokh Alemi Ph.D. Francesco Loaiza Ph.D. J.D. Vikas Arya Updated by Janusz Wojtusiak, Fall 2009. Objectives of Design. Increase efficiency Reduce redundancy Reduce missing data entries Allow users access to data without knowing its location - PowerPoint PPT Presentation

Transcript of Normalization of Databases

Page 1: Normalization of Databases

Normalization of DatabasesNormalization of Databases

Farrokh Alemi Ph.D.Farrokh Alemi Ph.D.

Francesco Loaiza Ph.D. J.D.Francesco Loaiza Ph.D. J.D.

Vikas AryaVikas Arya

Updated by Janusz Wojtusiak, Fall 2009Updated by Janusz Wojtusiak, Fall 2009

Page 2: Normalization of Databases

Normalization of DatabasesNormalization of Databases 22

Objectives of DesignObjectives of Design

Increase efficiency Increase efficiency Reduce redundancyReduce redundancy Reduce missing data entriesReduce missing data entries

Allow users access to data without Allow users access to data without knowing its locationknowing its location

Remove database anomalies.Remove database anomalies.

Page 3: Normalization of Databases

Normalization of DatabasesNormalization of Databases 33

Database AnomaliesDatabase Anomalies Update anomalyUpdate anomaly

Insertion anomalyInsertion anomaly

Deletion anomalyDeletion anomaly

Source: wikipedia.orgSource: wikipedia.org

Page 4: Normalization of Databases

Normalization of DatabasesNormalization of Databases 44

Design PrinciplesDesign Principles

1.1. Each Table should correspond to a Each Table should correspond to a single Entity or a single Relationshipsingle Entity or a single Relationship

2.2. Rows in the Table should correspond to Rows in the Table should correspond to individual occurrences of that Entity or individual occurrences of that Entity or RelationshipRelationship

3.3. The Primary Key should uniquely identify The Primary Key should uniquely identify individual occurrences of the Entity or individual occurrences of the Entity or RelationshipRelationship

Page 5: Normalization of Databases

Normalization of DatabasesNormalization of Databases 55

Design PrinciplesDesign Principles

1.1. Each Table should correspond to a Each Table should correspond to a single Entity or a single Relationshipsingle Entity or a single Relationship

2.2. Rows in the Table should correspond to Rows in the Table should correspond to individual occurrences of that Entity or individual occurrences of that Entity or RelationshipRelationship

3.3. The Primary Key should uniquely identify The Primary Key should uniquely identify individual occurrences of the Entity or individual occurrences of the Entity or RelationshipRelationship

Page 6: Normalization of Databases

Normalization of DatabasesNormalization of Databases 66

Design PrinciplesDesign Principles

1.1. Each Table should correspond to a Each Table should correspond to a single Entity or a single Relationshipsingle Entity or a single Relationship

2.2. Rows in the Table should correspond to Rows in the Table should correspond to individual occurrences of that Entity or individual occurrences of that Entity or RelationshipRelationship

3.3. The Primary Key should uniquely identify The Primary Key should uniquely identify individual occurrences of the Entity or individual occurrences of the Entity or RelationshipRelationship

Page 7: Normalization of Databases

Normalization of DatabasesNormalization of Databases 77

Principles of Design (Continued)Principles of Design (Continued)

4.4. Non-key fields should be facts about the Non-key fields should be facts about the occurrence identified by the Primary Keyoccurrence identified by the Primary Key

5.5. Each fact should be represented only Each fact should be represented only once in the databaseonce in the database

Page 8: Normalization of Databases

Normalization of DatabasesNormalization of Databases 88

Principles of Design (Continued)Principles of Design (Continued)

Non-key fields should be facts about the Non-key fields should be facts about the occurrence identified by the Primary Keyoccurrence identified by the Primary Key

Each fact should be represented only once Each fact should be represented only once in the databasein the database

Page 9: Normalization of Databases

Normalization of DatabasesNormalization of Databases 99

NormalizationNormalization

Normalization is the process of applying Normalization is the process of applying principles of design to data structures so principles of design to data structures so that they conform to out expectationsthat they conform to out expectations

This lecture covers three formal rules for This lecture covers three formal rules for designing databasesdesigning databases

These rules correspond to three, so called, These rules correspond to three, so called, Normal formsNormal forms

Page 10: Normalization of Databases

Normalization of DatabasesNormalization of Databases 1010

Design FlawDesign Flaw

Street address Zip First person Second person1319 Ozkan 22101 Jim Jill14 Yates 22112 George Janet

Household

Page 11: Normalization of Databases

Normalization of DatabasesNormalization of Databases 1111

First Normal FormFirst Normal Form A Table is in first Normal form (1NF) if A Table is in first Normal form (1NF) if

and only if all fields contain only atomic and only if all fields contain only atomic values and there are no repeating fields values and there are no repeating fields within a row within a row No Composite FieldsNo Composite Fields

• A street address is an example of a non-atomic A street address is an example of a non-atomic fieldfield

No Repeating GroupsNo Repeating Groups• If names are listed in two columns under a If names are listed in two columns under a

household, then the field is repeatedhousehold, then the field is repeated There is a primary key.There is a primary key.

Page 12: Normalization of Databases

Normalization of DatabasesNormalization of Databases 1212

First Normal Form?First Normal Form?

Number Street Zip Name1319 Ozkan 22101 Jim1319 Ozkan 22101 Jill

14 Yates 22112 George14 Yates 22112 Janet

Household

How about primary key?How about primary key?

ID

1

2

3

4

Page 13: Normalization of Databases

Normalization of DatabasesNormalization of Databases 1313

Another ExampleAnother Example

Consider a table with invoices of a Consider a table with invoices of a company company

Is this table in the first normal form?

Page 14: Normalization of Databases

Normalization of DatabasesNormalization of Databases 1414

Example cont.Example cont.

Is this table in the first normal form?

Page 15: Normalization of Databases

Normalization of DatabasesNormalization of Databases 1515

Functional DependencyFunctional Dependency

An Attribute Y is Functionally Dependent An Attribute Y is Functionally Dependent on an Attribute X, if a Value for X on an Attribute X, if a Value for X Determines a Unique Value for YDetermines a Unique Value for Y

X may be a Set of AttributesX may be a Set of Attributes Notation: Notation:

X Y (read X determines Y)X Y (read X determines Y)

Page 16: Normalization of Databases

Normalization of DatabasesNormalization of Databases 1616

Functional Dependency Functional Dependency ExampleExample

Employee Number Employee Name

Is it also true that?

Employee Number Employee Name

Page 17: Normalization of Databases

Normalization of DatabasesNormalization of Databases 1717

Full Functional DependencyFull Functional Dependency

An Attribute Y may be Determined by a Set An Attribute Y may be Determined by a Set of Attributes A,B,C (ABC Y)of Attributes A,B,C (ABC Y)

Let X is a Set of Attributes Such That X Y. Let X is a Set of Attributes Such That X Y. If there is no subset Z of X so that If there is no subset Z of X so that Z Y Z Y

then Y is Fully Functionally Dependent on Xthen Y is Fully Functionally Dependent on X

Page 18: Normalization of Databases

Normalization of DatabasesNormalization of Databases 1818

Example for Full Functional Example for Full Functional DependenceDependence

Employee Number, Dept Employee Name

But

Employee Number Employee Name

So Employee Name is not Fully FunctionallyDependent on Employee Number and Department

Page 19: Normalization of Databases

Normalization of DatabasesNormalization of Databases 1919

Second Normal FormSecond Normal Form

A Table is in Second Normal Form if and A Table is in Second Normal Form if and only if only if it is in the first normal formit is in the first normal form all informational fields (facts) are fully all informational fields (facts) are fully

functionally dependent on the primary key.functionally dependent on the primary key.

Page 20: Normalization of Databases

Normalization of DatabasesNormalization of Databases 2020

Violation of Second Normal Violation of Second Normal FormForm

Why the above table is not in the second normal form?

PK PK

Page 21: Normalization of Databases

Normalization of DatabasesNormalization of Databases 2121

Solution to Second Normal Solution to Second Normal FormForm

Wait!!!

Page 22: Normalization of Databases

Normalization of DatabasesNormalization of Databases 2222

Solution to Second Normal Solution to Second Normal FormForm

Page 23: Normalization of Databases

Normalization of DatabasesNormalization of Databases 2323

Third Normal FormThird Normal Form

A Table is in Third Normal form if and only if A Table is in Third Normal form if and only if It is in second normal formIt is in second normal form there are no combinations of strictly there are no combinations of strictly

informational fields (not primary key fields) that informational fields (not primary key fields) that determine the value of another fieldsdetermine the value of another fields

Page 24: Normalization of Databases

Normalization of DatabasesNormalization of Databases 2424

Violation of the Third Normal Violation of the Third Normal FormForm

Page 25: Normalization of Databases

Normalization of DatabasesNormalization of Databases 2525

Violation of the Third Normal Violation of the Third Normal FormForm

Page 26: Normalization of Databases

Normalization of DatabasesNormalization of Databases 2626

NormalizationNormalization

There are more normal formsThere are more normal forms The most commonly used are the three The most commonly used are the three

normal forms covered in this lecturenormal forms covered in this lecture Sometimes it is desired to use Sometimes it is desired to use

denormalization that is converting tables denormalization that is converting tables into lower normal forms, for example to into lower normal forms, for example to perform data analysis.perform data analysis.

Page 27: Normalization of Databases

Normalization of DatabasesNormalization of Databases 2727

Rules for Good DesignRules for Good Design

1.1. Each table should correspond to a single entity Each table should correspond to a single entity 2.2. Each row should correspond to occurrences of the Each row should correspond to occurrences of the

entity entity 3.3. Facts in the table should describe the primary key Facts in the table should describe the primary key 4.4. Each fact should be represented only once in the Each fact should be represented only once in the

database database 5.5. No composite or repeating fields should be used No composite or repeating fields should be used 6.6. No combination of facts should determine the value of No combination of facts should determine the value of

another another 7.7. The primary key should uniquely identify the entity The primary key should uniquely identify the entity 8.8. All facts should be fully functionally dependent on the All facts should be fully functionally dependent on the

primary key. primary key.