ARCH-3: Database Design, a Practical Guide Click to add subtitle Gus Björklund Wizard, Progress...
-
Upload
robert-anthony -
Category
Documents
-
view
215 -
download
1
Transcript of ARCH-3: Database Design, a Practical Guide Click to add subtitle Gus Björklund Wizard, Progress...
ARCH-3: Database Design, a Practical Guide
Click to add subtitle
Gus BjörklundWizard, Progress Software Corporation
© 2007 Progress Software Corporation2 ARCH-3: Database Design A Practical Guide
Ask questions as we goif I am not being clear.
Warning: there is a mistake in these slides.
© 2007 Progress Software Corporation3 ARCH-3: Database Design A Practical Guide
Rules are made to be broken
To every rule,there is an exception!
© 2007 Progress Software Corporation4 ARCH-3: Database Design A Practical Guide
If you thought this talk was going to be about indexing …
It isn’t. Nor is it about performance.
© 2007 Progress Software Corporation5 ARCH-3: Database Design A Practical Guide
Topics
Theory:• What is Database Design
• Basic Elements
• Representing the Model as Tables
Practice• An Example
Some Other Topics
© 2007 Progress Software Corporation6 ARCH-3: Database Design A Practical Guide
First, a little theory
© 2007 Progress Software Corporation7 ARCH-3: Database Design A Practical Guide
What do we mean by database design?
A process for defining a model of a subset of the “real”1 world, then representing it as data in tables in a relational database
At least, that’s the definition we will use for the purposes of this talk.
1 Well, for small values of real, anyway.
© 2007 Progress Software Corporation8 ARCH-3: Database Design A Practical Guide
Basic Elements
Just 3 Things:• Entities
• Attributes
• Relationships
What do we put in our model?
The “entity-relationship model” was described by Peter Chen in 1976.
See http://bit.csc.lsu.edu/~chen/chen.html
© 2007 Progress Software Corporation9 ARCH-3: Database Design A Practical Guide
Basic Elements: Entities
Can be thought of as nouns• People
– author, composer, performer, seller, buyer
• Places– home, IP address, URL, destination, factory,
store
• Things– song, recording, instrument, car, invoice
Is “telephone number” a place or a thing?
© 2007 Progress Software Corporation10 ARCH-3: Database Design A Practical Guide
Basic Elements: Attributes
Can be thought of as adjectives (but only loosely):• Length• Color• Horsepower• Part number• Song Title• Publication Date• Size• Fabric• Owner
Is “telephone number” a attribute or an entity?
Entities have attributes
© 2007 Progress Software Corporation11 ARCH-3: Database Design A Practical Guide
Basic Elements: Relationships
Can be thought of as verbs:• has a• owns• contains• supervises• performs• called• sold• purchased• proved
Entities are connected by relationships
Is “telephone number” a relationship?
© 2007 Progress Software Corporation12 ARCH-3: Database Design A Practical Guide
Relationships have attributes too
In May, 1995,Andrew Wiles
publisheda proof
of Fermat’s Last Theorem
© 2007 Progress Software Corporation13 ARCH-3: Database Design A Practical Guide
Relationships have attributes too
In May, 1995,Andrew Wiles
publisheda proof
of Fermat’s Last Theorem
entity
entityrelationship
attribute
© 2007 Progress Software Corporation14 ARCH-3: Database Design A Practical Guide
What goes in an entity
Identifying attributes• Must be able to uniquely identify the entity
• Can have more than one way to id
• Id can be composite
Descriptive attributes• the values you need to keep track of
• generally should be simple, not complex
© 2007 Progress Software Corporation15 ARCH-3: Database Design A Practical Guide
What to include in your model
The things your application has to keep track of• Telephones, wires, switches
The actions your application or its users perform• Make calls, send telephone bills, collect payments
Some attributes of the things and actions• Originating number, date and time of call, duration, called
number
Keep it simple Be accurate Keep it up to date
© 2007 Progress Software Corporation16 ARCH-3: Database Design A Practical Guide
What to include in your model
Consider the goals of the system Everything you include should be there for a
reason you can state• in no more than two sentences
Everything should have a clear name• if you can’t name it, it doesn’t belong
Talk to the stakeholders !!!
© 2007 Progress Software Corporation17 ARCH-3: Database Design A Practical Guide
What to leave out of your model
The real world has properties that don’t matter (to your application)
The real world has relationships that don’t matter
Things happen in the real world that don’t matter
Keep it simple• If you can’t say why you need it, leave it out
© 2007 Progress Software Corporation18 ARCH-3: Database Design A Practical Guide
Logical vs Physical Data Models
Logical entities often require multiple tables to represent them• Tables can be thought of as logical or physical• It depends on your point of view
There is also the physical storage database layout• storage areas• data extents• disks• etc.
We aren’t going to talk about the physical database layout
We will talk about tables
© 2007 Progress Software Corporation19 ARCH-3: Database Design A Practical Guide
Mapping Your Model to a Database
Entities become tables• Identifiers become indexes
Attributes become columns• Data types: pick appropriate
Relationships become tables or foreign keys
Simply put,
© 2007 Progress Software Corporation20 ARCH-3: Database Design A Practical Guide
“In theory, there is no difference betweentheory and practice, but in practice there is.”
Jan van de Snepscheut
© 2007 Progress Software Corporation21 ARCH-3: Database Design A Practical Guide
Now for some practice.
© 2007 Progress Software Corporation22 ARCH-3: Database Design A Practical Guide
An example
Music store• Buys compact disc recordings from
distributors
• Has inventory
• Allows customers to search for what they want– Maybe in an in-store kiosk or on the web
• Sells compact discs to customers
© 2007 Progress Software Corporation23 ARCH-3: Database Design A Practical Guide
What should we do first?
© 2007 Progress Software Corporation24 ARCH-3: Database Design A Practical Guide
Activities
We buy discs from a distributor Orders are sent to a distributor Orders are delivered to the store Orders may be cancelled We sell discs to customers in sales transactions Customers buy discs in sales transactions Customers search for what they want to buy
Which of these must be remembered by the system?
© 2007 Progress Software Corporation25 ARCH-3: Database Design A Practical Guide
What do we need to keep track of
Discs we have Discs we sold Discs we know about and can get Discs we have ordered Information needed to do our income tax
• what we paid for stock• when we bought it• what we sold it for• when we sold it
© 2007 Progress Software Corporation26 ARCH-3: Database Design A Practical Guide
Disc entities
UPC Code: 8697-07416-2 Manufacturer: Sony BMG Cost to us: $ 2.00 Price charged: $ 17.95 Tax charged: $ 0.80 Date purchased: March 19, 2007 Date sold: June 9, 2007
© 2007 Progress Software Corporation27 ARCH-3: Database Design A Practical Guide
Disc table might look like this
upc manuf cost price tax datePurch dateSold
8697-07416-2 Sony BMG 2.00 17.95 0.90 2007-03-19 2007-06-09
8697-07416-2 Sony BMG 2.00 ? ? 2007-06-09 ?
314-510347-2 Island Records 2.21 15.95 0.80 2006-01-12 2007-02-14
314-510347-2 Island Records 2.21 ? ? 2006-01-12
© 2007 Progress Software Corporation28 ARCH-3: Database Design A Practical Guide
What’s wrong?
Is upc a unique identifier? Might have bought from a distributor Have no information about what is on the disc
• How do customers search? Don’t know when disc was made Could be more than one tax jurisdiction
• provincial tax, city tax Don’t know if disc is on order Don’t know who bought it Duplicated data Etc., etc.
© 2007 Progress Software Corporation29 ARCH-3: Database Design A Practical Guide
Disc entities take 2
UPC Code: 8697-07416-2 Manufacturer: Sony BMG Distributor: Bob’s Wholesale CD’s Cost to us: $ 2.00 Price charged: $ 17.95 Tax charged: $ 0.80 Date ordered: March 19, 2007 Date received: March 20, 2007 Date sold: June 9, 2007 Disc Title: “The Essential Joshua Bell” Artist: Joshua Bell Track 1: “Danse Russe” Track 2: “Violin Concerto in E Minor” Track 3: “Nocturne in C-sharp Minor” etc.
© 2007 Progress Software Corporation30 ARCH-3: Database Design A Practical Guide
Example: Now What’s wrong?
This is getting messy Activities combined with disc’s attributes Have duplicated information How many tracks can there be? What if there is more than one artist? Don’t have all the information a customer
might want to use to search
© 2007 Progress Software Corporation31 ARCH-3: Database Design A Practical Guide
Discs revisited
Discs have titles Discs have pictures on the cover Discs contain tracks Discs are made by manufacturers Discs are purchased from distributors Discs are ordered from distributors Discs are delivered to the store Discs are sold to customers
© 2007 Progress Software Corporation32 ARCH-3: Database Design A Practical Guide
“Discs contain tracks …”
Tracks contain songs Tracks occur in order Tracks have a duration Songs are performed in performances Songs have performers (usually) Songs have composers Songs have names (titles) Songs have a key (but not always) Performances are done by performers Performers can be groups (bands, orchestras, etc.) Performances are performed in a location or venue
© 2007 Progress Software Corporation33 ARCH-3: Database Design A Practical Guide
We seem to need these entities
Discs Manufacturers Distributors Orders Customers Inventory
Tracks Songs Performers Groups ?
© 2007 Progress Software Corporation34 ARCH-3: Database Design A Practical Guide
Songs have names (titles).
Are names properties of songs?
Or are they entities related to songs?
Or are they something else?
© 2007 Progress Software Corporation35 ARCH-3: Database Design A Practical Guide
Song data (track 1)
Title “Danse Russe” from Swan Lake, Op.20
Time 4:30
Composer Peter Tchaikovsky
Category Classical, violin, orchestra
Performers Joshua Bell, Michael Tilson Thomas, Berlin Philharminic Orchestra
Track number 1
Disc upc 8697-07416-2
© 2007 Progress Software Corporation36 ARCH-3: Database Design A Practical Guide
Song data (track 2)
Title Violin Concerto in E Minor, Op. 64
Time 6:27
Composer Felix Mendelssohn
Category Classical, violin, orchestra
Performers Joshua Bell, Sir Roger Norrington, Camerata Salzburg
Track number 2
Disc upc 8697-07416-2
© 2007 Progress Software Corporation37 ARCH-3: Database Design A Practical Guide
Performance data
Title Violin Concerto in E Minor, Op. 64
Time 6:27
Composer Felix Mendelssohn
Category Classical, violin, orchestra
Performers Joshua Bell, Sir Roger Norrington, Camerata Salzburg
© 2007 Progress Software Corporation38 ARCH-3: Database Design A Practical Guide
Performance data take 2
Title Violin Concerto in E Minor, Op. 64
Time 6:27
Composer Felix Mendelssohn
Category Classical, violin, orchestra
Performers Joshua Bell, Sir Roger Norrington, Camerata Salzburg
Performance Date
?
Performance Location
?
© 2007 Progress Software Corporation39 ARCH-3: Database Design A Practical Guide
Performer data
id name
1 Joshua Bell
2 Sir Roger Norrington
3 Camerata Salzburg
4 Michael Tilson Thomas
5 Berlin Philharmonic
6 Bono
7 The Edge
8 Adam Clayton
9 Larry Mullen
© 2007 Progress Software Corporation40 ARCH-3: Database Design A Practical Guide
Performance to Performer Relationship
performance id performer id
1 1
1 2
1 3
1 …
2 1
2 4
2 5
2 …
325 6
325 7
325 8
325 9
© 2007 Progress Software Corporation41 ARCH-3: Database Design A Practical Guide
Performance data take 3
Performance id 2
Title Violin Concerto in E Minor, Op. 64
Time 6:27
Composer Felix Mendelssohn
Category Classical, violin, orchestra
© 2007 Progress Software Corporation42 ARCH-3: Database Design A Practical Guide
Track to Performance Relationship
Disc upc Track Num Performance id
8697-07416-2 1 1
8697-07416-2 2 2
… … …
314-510347-2 1 325
© 2007 Progress Software Corporation43 ARCH-3: Database Design A Practical Guide
Relationships (so far):
disctrack
track
track
trackperformance
performance
performance
performance
performer
performerone to one
one to many
many to many
© 2007 Progress Software Corporation44 ARCH-3: Database Design A Practical Guide
What happened to Songs?
© 2007 Progress Software Corporation45 ARCH-3: Database Design A Practical Guide
Relationships (take 2):
disctrack
track
track
tracksong
performance
performance
performance
performer
performer
one to one
one to many
many to many
performance
performance
performance
song
one to many
© 2007 Progress Software Corporation46 ARCH-3: Database Design A Practical Guide
Relationships (take 3):
disc
track
track
track
performance
performance
performance
performer
performer
song
song
song
© 2007 Progress Software Corporation47 ARCH-3: Database Design A Practical Guide
What about“business entities”
?
Where are they?
© 2007 Progress Software Corporation48 ARCH-3: Database Design A Practical Guide
Business entities
disc
track
track
track
performance
performance
performance
performer
performer
song
song
song
© 2007 Progress Software Corporation49 ARCH-3: Database Design A Practical Guide
Business entities
disc
track
track
track
performance
performance
performance
performer
performer
song
song
song
© 2007 Progress Software Corporation50 ARCH-3: Database Design A Practical Guide
Business entities
disc
track
track
track
performance
performance
performance
performer
performer
song
song
song
© 2007 Progress Software Corporation51 ARCH-3: Database Design A Practical Guide
Should you use arrays?
© 2007 Progress Software Corporation52 ARCH-3: Database Design A Practical Guide
Indexes
Enforce uniqueness Make searches faster Enable fast retrieval of entities by their
identities Enable finding entities with certain attributes
© 2007 Progress Software Corporation53 ARCH-3: Database Design A Practical Guide
What indexes do we needfor the music store database?
© 2007 Progress Software Corporation54 ARCH-3: Database Design A Practical Guide
Tables
0) Discs1) Tracks2) Songs3) Performers4) Performances5) Tracks of discs6) Performances of songs7) Performers of performances
© 2007 Progress Software Corporation55 ARCH-3: Database Design A Practical Guide
What indexes do we need
0) Indexes for identifying attributes1) A unique row identifier2) Indexes for the queries you will do
© 2007 Progress Software Corporation56 ARCH-3: Database Design A Practical Guide
What should we do next ?
© 2007 Progress Software Corporation57 ARCH-3: Database Design A Practical Guide
Other Topics
Normalization Unique keys Word indexes Naming Customisation
© 2007 Progress Software Corporation58 ARCH-3: Database Design A Practical Guide
Normalization
Oversimplified, it means:• Don’t duplicate data
Attributes should be simple• have only one value• be necessary• not derived data• don’t repeat
Complicated attributes are often entities in their own right• For example, addresses might be
© 2007 Progress Software Corporation59 ARCH-3: Database Design A Practical Guide
Unique keys
EVERY table must have a unique key EVERY row needs a unique identifier
• that never changes even if moved to another database (i.e. if you replicate)
Often, users don’t need to see it Use a UUID or sequence or maybe datetime Unique key is the ONLY way to identify rows
unambiguously ROWID’s are temporary and can change Use the same method throughout
• You’ll be glad you did
© 2007 Progress Software Corporation60 ARCH-3: Database Design A Practical Guide
Word indexes
Can be used to hold multiple status or attribute values• Conflicts with normalisation• Flexible
Easy to add new ones Queries are fast
Example:• Category: classical, violin, orchestral, concerto
© 2007 Progress Software Corporation61 ARCH-3: Database Design A Practical Guide
Naming
• What is in the column “GL01262” ?
Good names are crucial to understanding
© 2007 Progress Software Corporation62 ARCH-3: Database Design A Practical Guide
Naming
Table and column names should have clear meanings everyone can understand• “GL01262” vs “dateEntered”
Names with dashes cause inconvenience with SQL• “order-date”
Booleans should be named for truth value• “backOrdered”
No double negations• “notOutOfStock”
Good names are crucial to understanding
© 2007 Progress Software Corporation63 ARCH-3: Database Design A Practical Guide
Making tables customizable
Spare columns Separate table with spare columns Separate table with name/value pairs
We will look at 3 ways:
© 2007 Progress Software Corporation64 ARCH-3: Database Design A Practical Guide
Spare columns in table
custnum name city
001 Bob Phoenix
002 Alice Boston
003 Eve Denver
extra1 extra2 extra3
frozen ? 0.0
? 125.46 0.12
? ? ?
© 2007 Progress Software Corporation65 ARCH-3: Database Design A Practical Guide
Spare columns in table
custnum name city
001 Bob Phoenix
002 Alice Boston
003 Eve Denver
extra1 extra2 extra3
frozen ? 0.0
? 125.46 0.12
? ? ?
What data types should you use?How many spare columns?Wasted columns when not usedHow do you know what each spare got used for?How do you know how many unused spares you have?
© 2007 Progress Software Corporation66 ARCH-3: Database Design A Practical Guide
Separate table for spare columns
custnum name city
001 Bob Phoenix
002 Alice Boston
003 Eve Denver
custnum extra1 extra2 extra3
001 frozen ? 0.0
002 ? 125.46 0.12
© 2007 Progress Software Corporation67 ARCH-3: Database Design A Practical Guide
Separate table for spare columns
custnum name city
001 Bob Phoenix
002 Alice Boston
003 Eve Denver
custnum status owed discount
001 frozen ? 0.0
002 ? 125.46 0.12
© 2007 Progress Software Corporation68 ARCH-3: Database Design A Practical Guide
Separate table with name/value pairs
custnum name city
001 Bob Phoenix
002 Alice Boston
003 Eve Denver
custnum name value
001 status frozen
002 owed 125.46
002 discount 0.12
© 2007 Progress Software Corporation69 ARCH-3: Database Design A Practical Guide
Modeling Tools
PCase Enterprise Architect Power Designer ConceptDraw Erwin Rational
Pencil and paper !
Blackboard !
© 2007 Progress Software Corporation70 ARCH-3: Database Design A Practical Guide
Summary
Understand the requirements Leave out what is not needed Review the design with stakeholders Evolve the design as changes come up Test to make sure it works
• Can it do everything that is needed?
• Does it perform adequately?
Expect changes to come
© 2007 Progress Software Corporation71 ARCH-3: Database Design A Practical Guide
Homework
Papers• Wiles, A.: "Modular elliptic curves and Fermat's Last
Theorem”, Annals of Mathematics 141 (3): 443-551• Chen, P.: “The Entity-Relationship Model -- Toward a
Unified View of Data”, ACM TODS Vol 1, No 1, 1976 Wikipedia articles to start from:
• entity-relationship model• data model
Books:• Teorey, Lightstone, Nadeau: “Database Modeling and
Design”, Morgan Kaufmann.