Copyright: ©2005 by Elsevier Inc. All rights reserved. 1 Author: Graeme C. Simsion and Graham C....

21
Copyright: ©2005 by Elsevier Inc. All rights reserved. 1 Author: Graeme C. Simsion and Graham C. Witt Chapter 3 The Entity-Relationship Approach

Transcript of Copyright: ©2005 by Elsevier Inc. All rights reserved. 1 Author: Graeme C. Simsion and Graham C....

Page 1: Copyright: ©2005 by Elsevier Inc. All rights reserved. 1 Author: Graeme C. Simsion and Graham C. Witt Chapter 3 The Entity-Relationship Approach.

Copyright: ©2005 by Elsevier Inc. All rights reserved. 1

Author: Graeme C. Simsion and Graham C. Witt

Chapter 3 The Entity-Relationship Approach

Page 2: Copyright: ©2005 by Elsevier Inc. All rights reserved. 1 Author: Graeme C. Simsion and Graham C. Witt Chapter 3 The Entity-Relationship Approach.

Copyright: ©2005 by Elsevier Inc. All rights reserved. 2

Normalisation• Each column can only have single facts. Do

this first.• Very simply normalization is essentially a

two-step process:1. Put the data into tabular form (by removing

repeating groups to new tables).2. Remove duplicated data to separate tables.Critically: Every time we create a table (in either

step), we need to identify its primary key.

• We did all this in the example in the last lecture (the Drug Expenditure example)

Page 3: Copyright: ©2005 by Elsevier Inc. All rights reserved. 1 Author: Graeme C. Simsion and Graham C. Witt Chapter 3 The Entity-Relationship Approach.

Copyright: ©2005 by Elsevier Inc. All rights reserved. 3

More formally

• Apart from repeating groups, we are looking at certain relationships between data in the tables.– Which column(s) determine other

column(s)– Create tables around the determining

column(s) (we call these determining columns determinants)

Page 4: Copyright: ©2005 by Elsevier Inc. All rights reserved. 1 Author: Graeme C. Simsion and Graham C. Witt Chapter 3 The Entity-Relationship Approach.

Copyright: ©2005 by Elsevier Inc. All rights reserved. 4

Determinants

• We divided the various tables (in step 2) according to determinants.

• Hospital Number Hospital Name, Contact Person, Hospital Type, Teaching Status

• where we read ““ as “determines” or “is a determinant of”.

• Determinants can be a combination of two or more columns. Eg: Hospital Number + Operation Number Surgeon Number.

Page 5: Copyright: ©2005 by Elsevier Inc. All rights reserved. 1 Author: Graeme C. Simsion and Graham C. Witt Chapter 3 The Entity-Relationship Approach.

Copyright: ©2005 by Elsevier Inc. All rights reserved. 5

Step 2 of Normalisation

• Identify any determinants, other than the primary key, and the columns they determine

• Establish a separate table for each determinant and the columns it determines. The determinant becomes the key of the new table.

• Name the new tables.• Remove the determined columns from the

original table. Leave the determinants to provide links between tables.

Page 6: Copyright: ©2005 by Elsevier Inc. All rights reserved. 1 Author: Graeme C. Simsion and Graham C. Witt Chapter 3 The Entity-Relationship Approach.

Copyright: ©2005 by Elsevier Inc. All rights reserved. 6

What are determinants?

• Look for columns that appear by their names to be identifiers. These may be determinants or components of determinants.

• Look for columns that appear to describe something other than what the table is about. Then look for other columns that identify this “something”

Page 7: Copyright: ©2005 by Elsevier Inc. All rights reserved. 1 Author: Graeme C. Simsion and Graham C. Witt Chapter 3 The Entity-Relationship Approach.

Copyright: ©2005 by Elsevier Inc. All rights reserved. 7

Which Determinants were in the Drug Expenditure Example?

• Hospital Number Hospital Name, Contact Person, Hospital Type, Teaching Status.

• Others in Operation table:– Hospital Number + Surgeon Number Surgeon

Specialty– Operation Code Operation Name, Procedure

Group

• Drug Administration table:– Drug Short Name Drug Name, Manufacturer– Drug Short Name + Method of Administration + Size

of Dosage + Unit of Measure Dose Cost

Page 8: Copyright: ©2005 by Elsevier Inc. All rights reserved. 1 Author: Graeme C. Simsion and Graham C. Witt Chapter 3 The Entity-Relationship Approach.

Copyright: ©2005 by Elsevier Inc. All rights reserved. 8

The Final Design

• The final design we have is in Third Normal Form (3NF).

• By splitting tables along determinants (or functional dependencies) we can tet the design into 3NF easily.

• What about Performance? Surely all Those Tables Will Slow Things Down?

Page 9: Copyright: ©2005 by Elsevier Inc. All rights reserved. 1 Author: Graeme C. Simsion and Graham C. Witt Chapter 3 The Entity-Relationship Approach.

Copyright: ©2005 by Elsevier Inc. All rights reserved. 9

Take a moment…

• Go back and examine the last lecture and see that this is what we did in normalization!

Page 10: Copyright: ©2005 by Elsevier Inc. All rights reserved. 1 Author: Graeme C. Simsion and Graham C. Witt Chapter 3 The Entity-Relationship Approach.

Copyright: ©2005 by Elsevier Inc. All rights reserved. 10

Performance of Normalised Databases

• There are many tables for what seems to be relatively little data.

• Thanks to advances in the capabilities of DBMSs, and the increased power of computer hardware, the number of tables is less likely to be an important determinant of performance than it might have been in the past.

• But, performance is not an issue at this stage (that comes later). We are designing here!

Page 11: Copyright: ©2005 by Elsevier Inc. All rights reserved. 1 Author: Graeme C. Simsion and Graham C. Witt Chapter 3 The Entity-Relationship Approach.

Copyright: ©2005 by Elsevier Inc. All rights reserved. 11

Definitions and a Few Refinements (1)

• Determinants and Functional Dependency– For each value of the determinant, there can only

be one value of some other nominated column(s) in the table at any point in time

– The other nominated columns are functionally dependent on the determinant.

– The determinant concept is what 3NF is all about; we are simply grouping data items around their determinants.

Page 12: Copyright: ©2005 by Elsevier Inc. All rights reserved. 1 Author: Graeme C. Simsion and Graham C. Witt Chapter 3 The Entity-Relationship Approach.

Copyright: ©2005 by Elsevier Inc. All rights reserved. 12

Definitions and a Few Refinements (2)

• Primary Keys– A primary key is a nominated column or

combination of columns that has a different value for every row in the table. Each table has one (and only one) primary key.

• Candidate Keys– Sometimes more than one column or combination

of columns could serve as a primary key. We refer to such possible primary keys, whether chosen or not, as candidate keys.

Page 13: Copyright: ©2005 by Elsevier Inc. All rights reserved. 1 Author: Graeme C. Simsion and Graham C. Witt Chapter 3 The Entity-Relationship Approach.

Copyright: ©2005 by Elsevier Inc. All rights reserved. 13

Definitions and a Few Refinements (3)

• A More Formal Definition of Third Normal Form

• If we define the term “non-key column” to mean “a column that is not part of the primary key,” then we can say:– A table is in 3NF if the only determinants of non-

key columns are candidate keys.– If we want to be even more formal, we should

explicitly exclude trivial determinants: each column is of course a determinant of itself.

Page 14: Copyright: ©2005 by Elsevier Inc. All rights reserved. 1 Author: Graeme C. Simsion and Graham C. Witt Chapter 3 The Entity-Relationship Approach.

Copyright: ©2005 by Elsevier Inc. All rights reserved. 14

Definitions and a Few Refinements (3)

• Foreign Keys– When removing repeating groups to a new table, we

carried the primary key of the original table with us, to cross-reference to the source.

– These cross-referencing columns are called foreign keys, and they are our principal means of linking data from different tables.

– Note that “elsewhere in the data model” may include “elsewhere in the same table.” For example, an Employee table might have a primary key of Employee Number.

– A common convention for highlighting the foreign keys in a model is an asterisk, as shown.

Page 15: Copyright: ©2005 by Elsevier Inc. All rights reserved. 1 Author: Graeme C. Simsion and Graham C. Witt Chapter 3 The Entity-Relationship Approach.

Copyright: ©2005 by Elsevier Inc. All rights reserved. 15

Definitions and a Few Refinements (4)

• Referential Integrity– Imagine the Operation table that uses

hospital number to point to the relevant Hospital records. We expect every hospital number in the Operation table to have a matching hospital number in the Hospital table. This is referential integrity.

• Modern DBMSs provide referential integrity features.

Page 16: Copyright: ©2005 by Elsevier Inc. All rights reserved. 1 Author: Graeme C. Simsion and Graham C. Witt Chapter 3 The Entity-Relationship Approach.

Copyright: ©2005 by Elsevier Inc. All rights reserved. 16

Anomalies that Normalisation is Really About

• Update Anomalies:– Insertion anomalies– Change anomalies– Deletion anomalies

Page 17: Copyright: ©2005 by Elsevier Inc. All rights reserved. 1 Author: Graeme C. Simsion and Graham C. Witt Chapter 3 The Entity-Relationship Approach.

Copyright: ©2005 by Elsevier Inc. All rights reserved. 17

Denormalization and Unnormalization

• it is sometimes necessary to compromise one data modeling objective to achieve another.

• Occasionally, we implement database designs that are not fully normalized to achieve some other objective (most often performance).

• We normalize to achieve: completeness, non-redundancy, flexibility of extending repeating groups, ease of data reuse, and programming simplicity. We sacrifice this when we de-normalize.

• In many cases, these sacrifices will be prohibitively costly.

Page 18: Copyright: ©2005 by Elsevier Inc. All rights reserved. 1 Author: Graeme C. Simsion and Graham C. Witt Chapter 3 The Entity-Relationship Approach.

Copyright: ©2005 by Elsevier Inc. All rights reserved. 18

You don’t need to normalize like this always

• The past two lectures have shown you what makes a well structured database design shown as tables.

• Don’t do it like this every time!• There is the equivalent of a blue-print for data

modelling other than the table-like description we’ve seen.

• Let’s return to the Drug Expenditure design.

Page 19: Copyright: ©2005 by Elsevier Inc. All rights reserved. 1 Author: Graeme C. Simsion and Graham C. Witt Chapter 3 The Entity-Relationship Approach.

Copyright: ©2005 by Elsevier Inc. All rights reserved. 19

Drug Expenditure Database Model as Relations (Tables)

• OPERATION (Hospital Number*, Operation Number, Operation Code*, Surgeon Number*)

• SURGEON (Hospital Number*, Surgeon Number, Surgeon Specialty)• OPERATION TYPE (Operation Code, Operation Name, Procedure

Group)• STANDARD DRUG DOSAGE (Drug Short Name*, Method of

Administration, Size of Dose, Unit of Measure, Method of Administration, Standard Cost of Dose Cost)

• DRUG (Drug Short Name, Drug Name, Manufacturer)• HOSPITAL (Hospital Number, Hospital Name, Hospital Category,

Contact Person)• DRUG ADMINISTRATION (Hospital Number*, Operation Number*,

Drug Short Name*, Method of Administration*, Size of Dose*, Unit of Measure*, Method of Administration*, Hospital Number*, Operation Number*, Number of Doses)

Page 20: Copyright: ©2005 by Elsevier Inc. All rights reserved. 1 Author: Graeme C. Simsion and Graham C. Witt Chapter 3 The Entity-Relationship Approach.

Copyright: ©2005 by Elsevier Inc. All rights reserved. 20

Drug Expenditure Database Model as Entity-Relationship Model

Hospital

Operation

OperationType

Surgeon

DrugAdmin

Drug

StandardDrug Dosage

be performed at

perform

operate be at operated

at by

manage

be classifyclassified by

follow

be followed by

use

be used in

be used in

use

be of be available in

be prescribed at

prescribe

bemanaged

by

Page 21: Copyright: ©2005 by Elsevier Inc. All rights reserved. 1 Author: Graeme C. Simsion and Graham C. Witt Chapter 3 The Entity-Relationship Approach.

Copyright: ©2005 by Elsevier Inc. All rights reserved. 21

What did we do?

• Each table is a box• Each link via a foreign key is shown using a

line with some other markings (we’ll get to these)

• Each box has a name that describes what each row in the underlying table is about

• What do each of these mean?• This leads us to the higher level model called

the Entity-Relationship Model… it is the architects view of the database.

• We begin this next lecture…