Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related...

85
Data Modeling Overview By: Dave Wentzel

Transcript of Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related...

Page 1: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

Data Modeling Overview

By: Dave Wentzel

Page 2: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

What we will accomplish

Review of DBMS Issues related to DBMS Entity Relationship Modeling

– Process flow– Model types– Component definition

Selecting entities and attributes Defining relationships

Page 3: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

What we will accomplish

Defining Cardinality Selecting Primary Keys Review of recursive relationships, weak

entities, and ternary relationships Participation constraints Erwin Notation NULL issues The Physical Model

Page 4: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

What we will accomplish

Generalization / Specialization Transaction processing Normalization Rules History issues

Page 5: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

What is data? Data

– Raw facts. Can be described, observed, and measured.

Information– Data organized in a form that is useful for

decision making. The meaning behind the data.– New thing not previously observed that is

created based on the data. Knowledge

– Information that is used for decision making.

Page 6: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

What is a Database?

Collection of interrelated data Data which can be visualized in a table

format Contains relationships between data Can be of any size and varying complexity Can be maintained manually or by

computer

D atabase

Page 7: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

Data Base Management System (DBMS)

Collection of programs (software) that allows users to create and maintain a database

Supports data:– Definition - specification of data types,

structures, and constraints– Construction - storing of the data itself– Manipulation - updating & querying of the data

Defines itself. Contains a catalog which describes its data.

Page 8: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

Components of a DBMS

Catalog– Maintains information about the data in the

database– Considered data about data (metadata)

Databases– Collection of related tables

Tables– Rows and columns containing data

Page 9: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

Issues in DBMS Data independence Query optimization

– Improve efficiency– Faster responses

Transaction management– Sequence of operations that are treated as a unit– Once 1st step is completed, 2nd step must also be

completed otherwise 1st step is aborted (ROLLBACK mechanism)

Example: Transferring Bank Funds

Page 10: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

Issues in DBMS continued

Transaction management – Concurrency– Recovery

Controlled redundancy– Goal of database design is to minimize

redundancy (duplicate data) Integrity constraints

– Includes business rules and data rules

Page 11: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

Issues in DBMS continued

Security and privacy– Protect against unauthorized access

Data / database administration– Involves managing people, data, performance,

security, etc.

Page 12: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

Entity Relationship Modeling

Person Account

T ransaction

Em ployee

Page 13: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

Data Model

Tool for describing data, its relationships, semantics, and integrity constraints

Provides for data abstraction Hides details of data storage

Page 14: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

Why use an ER Model?

Easy to use for modeling DB design Succinct representation of database layout Good communication tool among project

team members Most case tools support ER modeling Implementation independent

Page 15: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

Categories of Data Models

Logical model – Conceptual data model– High level model– Closest view user has of the data

Physical model– Low level model– Defines how data is stored

Page 16: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

Steps in Database Design

Mini World

RequirementsCollection and

Analysis

Functional Analysis

Functional Requirements

Database Requirements

API

Physical Design

TransactionImplementation

Application ProgramDesign

Logical Model

Data Model MappingDBMS Independent

DBMS Specific

High Level Trans-action Requirement

Internal Schema

Application Programs

Page 17: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

ER Modeling composed of

Entity (table) Attribute (field) Relationship

– Binary Relationships– Cardinality of relationships

Page 18: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

What is an entity?

Conceptual definition– Distinguishable object that exists

Operational definition– Business object that has properties we are

interested in storing Physical definition

– Set of related data forming a table composed of attributes (fields)

Page 19: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

Entities

Primary THINGS of a business about which users need to record data

Objects about which the business is interested in tracking information

When an ER Diagram is translated into a relational model, the entities become the tables.

Page 20: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

Selecting Entities

Nouns are candidate entities Possible classes of entities:

– People who carry out some function ( employees, students, customers)

– Places (cities, offices, routes)– Things which are tangible physical objects

(equipment, products, buildings)– Organizations (teams, suppliers, departments)

Page 21: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

Selecting Entities Continued

Events which occur at a given date/time or have steps (employee promotions, project phases, account payments)

Concepts which are intangible ideas used to keep track of business activities (projects, accounts, complaints)

Page 22: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

Questions to ask...

What things do we need to keep data about? What things are essential to the organization? What things do we talk about in the organization? What questions do we have that reports can help

answer? What information should the reports contain?

Page 23: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

Naming entities

Use a SINGULAR noun Meaningful but intuitive Avoid names which may be misinterpreted within

the problem domain Follow organizational / industry trends Do not try to rename entities within an organization Avoid abused names such as Task, Form,

Operation, Schedule...

Page 24: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

Is it an entity to worry about?

Decide if an entity is relevant to your problem domain by determining if it has attributes you need to track

If it does not have attributes you need to track, it is NOT a valid entity for your problem

Page 25: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

Is it really an entity?

Can you define attributes for it? An attribute is a piece of information that we are interested in tracking about an entity. It is a property of an entity.

In general, if two objects differ by one attribute, they are separate entities.

Does it participate in a relationship? Two entities that are related somehow interact with one another.

Page 26: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

Attributes

Properties of an object (entity) Each attribute has a data type (char, int,

datetime) Each attribute in an RDBMS (relational

database management system) has only one value at a time (atomic)

Page 27: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

Categories of Attributes

Descriptive– Property of the entity that helps describe the

entity Identifying (key attributes)

– Property of the entity that helps uniquely identify the entity

– Normally short– If one does not exist it MUST be created– If creating a key, use a numeric/integer data

type

Page 28: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

Types of Attributes

Atomic– Indivisible value– Most desired state

Composite– Can be divided into smaller parts– Need to convert into atomic

Page 29: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

Types of Attributes Continued

Multi-valued– Multiple instances of an attribute– Normally create another entity

Derived– Can be determined by the value of another

attribute or attributes– In most cases, do NOT store derived attributes

Page 30: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

Naming Attributes

Use a noun, adjective, or adverb Name should be unique database wide Use attribute names consistently Use singular names Define a naming convention for the

organization

Page 31: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

Rules for Entity Analysis Every noun is a candidate for an entity Every entity should be relevant to the problem If an object has only one property of importance,

then it should be considered an attribute of another entity

If an object has only one data instance (1 row) then do not model as an entity

If an object needs a unique identifier then model it as an entity

Page 32: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

Relationships

Way entities interact with one another An association between two or more

entities Depicts business interactions between

entities They DO NOT represent business flow

Page 33: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

Relationships Continued

Number of entities associated through a relationship defines its degree (unary, binary, ternary, n-ary)

Cardinality defines the maximum number of entities that can participate in the relationship

Page 34: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

How to Identify a Relationship

Ask what is the action or verb used to describe how one entity interacts with another

Three types of relations to consider:– Existence (Employee HAS Children)– Functional (Professor TEACHES Course)– Event (Customer PLACES Order)

Ignore verbs not important to the organization

Page 35: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

More on Relationships

Relationships and cardinality constraints represent business rules

When naming a relationship use and active verb in the present tense

Relationships are read bi-directionally

Page 36: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

Example notes: Together the customer and account tables form a

schema - structure / layout of a logical database design

Note the attributes. Order DOES NOT MATTER but convention puts primary key first.

No duplicates for attributes. No duplicate tuples (rows) Relationship - same attribute name ( or different

attribute name with same meaning, in 2 tables.

Page 37: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

Cardinality Constraints

Express the MAXIMUM number of entities that can be associated with another entity via a relationship

Also known as mapping constraints Types:

– 1:1 (one to one)– 1:N (one to many)– N:M (many to many)

Page 38: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

The Key to It All

Page 39: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

Identifiers...

Attribute(s) which uniquely identify a record

An entity may have multiple identifiers Every entity MUST have at least one Can be made up of more then one attribute

Page 40: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

Candidate vs. Primary Keys

Both are identifiers Candidate keys are all the identifiers from

which you can choose which uniquely identify the record

Primary key is the one candidate key which is selected to always uniquely identify the record

Page 41: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

Selecting the Primary Key

In general we create a primary key however...

Choose the attribute most widely used in the query

Select the shorter data type If one does not exist, must create one Select a MINIMUM key if using compound

attributes (not recommended)

Page 42: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

Key Requirements and Preferences Known at all times Can NOT be null Should not be changed Shorter is better Numeric / integer is better Avoid keys containing letters O, I, Z, S - can be

confused with numbers If key includes time, it should be in 24hr format Avoid carrying meaning

Page 43: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

With this all said...

It is difficult to come up with a primary key based on real attributes which will not change over time (phone numbers, SSN, addresses, driver’s license numbers…)

In most cases it is best to create the primary key

In SQL Server can use the identity column which creates a sequential number

Page 44: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

Primary Keys and Relationships

In a 1:1 relationship, the primary key of either one of the entities must migrate to the other entity

In a 1:N, the primary key of the 1 side must migrate to the entity on the N side

In a M:N, the keys of both entities are used to identify a new entity which resolves the M:N into two 1:N relationships

Page 45: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

Foreign Key

When a key migrates to another entity it is called a Foreign Key

A foreign key CAN BE null if it is not part of an entity’s primary key

If the FK value is NOT null, then that value MUST exist in the table in which it is the primary key. This is called Referential Integrity (RI)

Page 46: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

Recursive Relationships

An entity having a relationship with itself Same entity participates more than once in

a relationship type in different roles Same cardinality examples exist in

recursive relationships

Page 47: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

Weak Entity Type

Entity that does not have a key attribute of its own

Identified by its relationship with another entity Created for multi-valued attributes and time

dependent attributes Weak entity has EXISTENCE dependence on

the parent. Only exists if the owner entity exists.

Page 48: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

Primary Keys of Weak Entities

Can use the primary key of the owner entity along with a qualifier such as sequence number or date/time

Can create a surrogate key but make sure you migrate the key of the parent

Page 49: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

Ternary Relationship

Relationship between 3 entities Differs from 3 binary relationships States that all three entities occur at the

same time Must be converted to binary relationships

Page 50: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

Creating Binary Relationships from a Ternary Relationship

Page 51: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

Participation Constraints

Specifies whether the existence of an entity depends on its being related to another entity via a relationship

Notes the minimum cardinality Total participation (mandatory) Partial participation (optional)

Page 52: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

Identifying Participation Constraints

Can entity A exist without entity B?– If no, A has total participation in the

relationship– If yes, entity A has partial participation in the

relationship

Page 53: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

Identifying Relationships In Erwin

An identifying relationship is a relationship between two tables in which an instance of a child table is identified through its association with a parent table, which means the child table is dependent on the parent table for its identity, and cannot exist without it. In an identifying relationship, one instance of the parent table is related to multiple instances of the child.

Page 54: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

Non-Identifying Relationship In Erwin

A non-identifying relationship is a relationship between two tables in which an instance of the child table is not identified through its association with a parent table, which means the child table is not dependent on the parent table for its identity, and can exist without it. In a non-identifying relationship, one instance of the parent table is related to multiple instances of the child.

Page 55: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

Optional Non-Identifying

In an optional non-identifying relationship, the columns that are migrated into the non-key area of the child table are not required in the child table. This means that nulls are allowed in the foreign key. ERwin draws an optional non-identifying relationship differently depending on the notation for your diagram

Page 56: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

Mandatory Non-Identifying

In a mandatory non-identifying relationship, the columns that are migrated into the non-key area of the child table are required in the child table. This means that the foreign key cannot be null.

Page 57: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

Erwin NotationCardinality Description

Identifying Non-Identifying

Nulls No Nulls

One to 0, 1, or M

Page 58: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

To Null or Not to Null….

NULL means no value Two types of null values

– Unknown– None (does not exist or not applicable)

Page 59: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

Null Examples

Employee

e# name salary spouse1 Bob 10,000 Mary2 Jack 20,000 Kate3 Mary 30,000 NULL4 Kelly NULL John

Questions:

• How many people make more than 15K?

• What is the average salary?

• Is Mary married?

Page 60: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

Problems with NULL

Null values are ambiguous More programming is required to deal with

NULL values Try to use UNKNOWN or NONE if

applicable

Page 61: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

Getting Physical…

Page 62: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

Getting Physical…

Converting the logical data model into the physical data model

Page 63: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

Things to do when converting

Identify data type– Is it a string (character field) or a number?– Use of varchar() or char()?– Dates are dates not strings

Identify data length– Consider growth over time and maximum size

requirements Identify value constraints (valid ranges, values,

etc.)

Page 64: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

Things to do when converting

Follow proper naming conventions Determine indexes Consider combining 1:1 relationship

entities Roll-up generalization / specialization

hierarchies Add organizational attributes if any

Page 65: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

Indexes

Index is a physical access structure Makes queries more efficient Things to consider when creating

– Create an index for each PK– Create an index for each FK– Create an index for each AK which will be used in

queries– Try to minimize number of indexes (update

overhead)

Page 66: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

Specialization / Generalization

Page 67: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

Specialization / Generalization

Inheritance / Abstraction Subclasses / Superclasses

Page 68: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

Specialization / Generalization

Two processes resulting in the same model Specialization is top-down approach. Can a

high level entity be broken down? Generalization is bottom-up approach. Can

entities be combined at a higher level?

Page 69: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

Example

Page 70: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

Notes on Generalization/Specialization Key of subclass is always key of superclass Subclasses can participate in their own relationships Participation in a subclass can either be inclusive or

exclusive Exclusive subclasses should be defined by a type Multiple inheritance not allowed in most modeling tools When converting to physical could combine into one

entity

Page 71: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

Database Operations

CRUD – Create (Insert)– Read– Update (Modify)– Delete

Transactions can not violate any integrity constraints

Several may be grouped into a transaction May propagate to maintain integrity constraints

Page 72: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

If update violations occur

Cancel the operation (Restrict) Perform additional updates / deletes so the

violation is corrected (Cascade) Execute a user specified operation to

correct (Trigger) Perform the operation but inform the user

Page 73: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

Normalization - What’s normal...

Page 74: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

Normalization

Process to design a highly desirable relational schema using functional dependencies

Guidelines for relational database design which– Minimize redundancy– Avoid potential inconsistency– Help predict data behavior problems– Avoid update anomalies

Page 75: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

Update Anomalies

Insert extra values Add redundant records Delete records not intended Change a fact more then once, possibly in

multiple tables Miss changing a fact which is repeated

multiple times

Page 76: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

Normal Forms

First Normal Form Second Normal Form Third Normal Form Boyce-Codd Normal Form Fourth Normal Form Fifth Normal Form

# of Tables

Joins

Page 77: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

First Normal Form A relation is in 1NF if it contains only scalar

(atomic) values– One value for an attribute– No repeating groups– No composite attributes– No multi-valued attributes

To convert to 1NF– Create 1 table for each repeating group by adding the

PK of the original table– Remove the repeating group from the original table

Page 78: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

Example of Non-1NF w/ ConversionNon-1NF

Dname Dnumber DMGRSSN DlocationsResearch 5 333445555 {Bellaire, Sugarland, Houston}Administration 4 987654321 Stafford, VoorheesHeadquarters 1 888665555 Houston

1NF (note redundancy)

Dname Dnumber DMGRSSN DlocationsResearch 5 333445555 BellaireResearch 5 333445555 SugarlandResearch 5 333445555 HoustonAdministration 4 987654321 StaffordAdministration 4 987654321 VoorheesHeadquarters 1 888665555 Houston

Page 79: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

Example of Non-1NFEmployeeProject - NON-1NF

SSN Ename Pnumber Hours123456789 Smith, John 1 32.5

2 7.5666885555 Narayan, Ramesh 3 40453223344 English, Joyce 1 20

2 20

Conversion

SSN Ename SSN Pnumber Hours123456789 Smith, John 123456789 1 32.5666885555 Narayan, Ramesh 123456789 2 7.5453223344 English, Joyce 666885555 3 40

453223344 1 20453223344 2 20

Page 80: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

Second Normal Form

All attributes in the relation have a functional dependency on the complete PK

Each non-key attribute is uniquely defined by all components of the primary key

Page 81: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

Example of Non-2NF w/ ConversionEmployeeProject

SSN Pnumber Hours Ename Pname Plocation FD1

FD2FD3

Conversion to 2NF

EP1SSN Pnumber Hours

EP2SSN Ename

EP3Pnumber Pname Plocation

Page 82: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

Third Normal Form

Every non-key attribute (does not participate in the primary key) is mutually independent

Irreducibly dependent on the primary key

Page 83: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

Example of Non-3NF w/ ConversionExample

LotsPropertyID# CountyName Lot# Area Price TaxRate

2NF

Lots1PropertyID# CountyName Lot# Area Price

Lots2CountyName TaxRate

3NF

Lots1APropertyID# CountyName Lot# Area

Lots1BArea Price

Page 84: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

Maintaining History

Maintaining History can serve one of two purposes:– Tracking changes in the entity over time– Tracking record history in order to maintain inactive

records over time and maintain RI Tracking changes in an entity over time is very

difficult and requires significant storage Tracking inactive records is our standard here

and provides value to the end user

Page 85: Data Modeling Overview By: Dave Wentzel. What we will accomplish u Review of DBMS u Issues related to DBMS u Entity Relationship Modeling –Process flow.

Examples of History…