How to Fake a Database Design

67
How to Fake a Database Design How do I spell “normalization”? OSCON 2014 Curtis "Ovid" Poe http:// allaroundtheworld.fr / Copyright 2014, http://www.allaroundtheworld.fr/ 5/18/22

description

In evaluating developers, I routinely come across very talented developers with a decade or more of experience with databases who nonetheless can't design even the simplest of schemas. This presentation is based on my popular blog post of the same name: http://blogs.perl.org/users/ovid/2013/07/how-to-fake-database-design.html

Transcript of How to Fake a Database Design

Page 1: How to Fake a Database Design

April 7, 2023

How to Fake a Database Design

How do I spell “normalization”?OSCON 2014

Curtis "Ovid" Poehttp://allaroundtheworld.fr/

Copyright 2014, http://www.allaroundtheworld.fr/

Page 2: How to Fake a Database Design

April 7, 2023 Copyright 2014, http://www.allaroundtheworld.fr/

Good Database Schemas

• Generally normalized• Denormalized only as necessary• No duplicate data

Page 3: How to Fake a Database Design

April 7, 2023 Copyright 2014, http://www.allaroundtheworld.fr/

Typical Developer Schemas

• A steaming pile of ones and zeros• … with a “family friendly” background

Source: http://commons.wikimedia.org/wiki/File:Spaghetti-prepared.jpg

Page 4: How to Fake a Database Design

April 7, 2023 Copyright 2014, http://www.allaroundtheworld.fr/

Database Normalization

• Remove redundancy• Create logical relations• Decomposing data to atomic elements

Page 5: How to Fake a Database Design

April 7, 2023 Copyright 2014, http://www.allaroundtheworld.fr/

Only Covering 3NF

1. Remove repeating groups of data2. Remove partial key dependencies3. Remove data unrelated to key

Page 6: How to Fake a Database Design

April 7, 2023 Copyright 2014, http://www.allaroundtheworld.fr/

How to Feel Stupid“It is shown that if a relation schema is in third normal form and every key is simple, then it is in projection-join normal form (sometimes called fifth normal form), the ultimate normal form with respect to projections and joins.”

Simple Conditions for Guaranteeing Higher Normal Forms in Relational Databases — C. J. Date

http://commons.wikimedia.org/wiki/File:%22I_should_have_gone_to_the_pro_station%22_-_NARA_-_514564.tif

Page 7: How to Fake a Database Design

April 7, 2023 Copyright 2014, http://www.allaroundtheworld.fr/

‘Nuff of that – Let’s Get Started

I’m going to discuss “how”, not “why”,because I only have 50 minutes.

Page 8: How to Fake a Database Design

April 7, 2023 Copyright 2014, http://www.allaroundtheworld.fr/

Faking a Database Design

• Forget everything you know about Excel• Focus on nouns (sort of)• Duplicate data is a design flaw

Page 9: How to Fake a Database Design

April 7, 2023 Copyright 2014, http://www.allaroundtheworld.fr/

Real-World Problem

• Client wanted a rewrite of recipes site• They sent us their Access (!) database• Main objects:– customers– recipes– orders

Page 10: How to Fake a Database Design

April 7, 2023 Copyright 2014, http://www.allaroundtheworld.fr/

Our “DBA” Said This Was OK

Page 11: How to Fake a Database Design

April 7, 2023 Copyright 2014, http://www.allaroundtheworld.fr/

Our “DBA” also lost his job shortly thereafter

Page 12: How to Fake a Database Design

April 7, 2023 Copyright 2014, http://www.allaroundtheworld.fr/

Back to the plot …

• Customers• Orders• Recipes

Page 13: How to Fake a Database Design

April 7, 2023 Copyright 2014, http://www.allaroundtheworld.fr/

Nouns == Tables(*)

Page 14: How to Fake a Database Design

April 7, 2023 Copyright 2014, http://www.allaroundtheworld.fr/

Nouns == Tables(*)

Page 15: How to Fake a Database Design

April 7, 2023 Copyright 2014, http://www.allaroundtheworld.fr/

Rule #1

1. Nouns == tables

Page 16: How to Fake a Database Design

April 7, 2023 Copyright 2014, http://www.allaroundtheworld.fr/

What’s with the customer_id?

Page 17: How to Fake a Database Design

April 7, 2023 Copyright 2014, http://www.allaroundtheworld.fr/

It’s a foreign key

One-to-many relationship

Page 18: How to Fake a Database Design

April 7, 2023 Copyright 2014, http://www.allaroundtheworld.fr/

Our DDL (Data Definition Language)

CREATE TABLE orders ( order_id SERIAL PRIMARY KEY, customer_id INTEGER NOT NULL, order_date TIMESTAMP WITH TIME ZONE NOT NULL, FOREIGN KEY (customer_id) REFERENCES customer(customer_id));

Page 19: How to Fake a Database Design

April 7, 2023 Copyright 2014, http://www.allaroundtheworld.fr/

Rule #2

1. Nouns == tables2. Another table’s ID must have a FK constraint

Page 20: How to Fake a Database Design

April 7, 2023 Copyright 2014, http://www.allaroundtheworld.fr/

Oh dog, no!

Page 21: How to Fake a Database Design

April 7, 2023 Copyright 2014, http://www.allaroundtheworld.fr/

But “What if”?1. fettuccinne2. fettuchini3. fettucini4. fettucinne5. fetuchine6. fetuchinney7. fetuchinni8. fetucine9. fetucini10. fetucinni

https://www.flickr.com/photos/ykjc9/3485366680/sizes/l

Page 22: How to Fake a Database Design

April 7, 2023 Copyright 2014, http://www.allaroundtheworld.fr/

SearchingSELECT recipe_id, name FROM recipes WHEREingredient1 IN ( 'fettuccinne', 'fettuchini', 'fettucini', 'fettucinne', 'fetuchine', 'fetuchinney',

'fetuchinni', 'fetucine', 'fetucini', 'fetucinni') ORingredient2 IN ( 'fettuccinne', 'fettuchini', 'fettucini', 'fettucinne', 'fetuchine', 'fetuchinney',

'fetuchinni', 'fetucine', 'fetucini', 'fetucinni') ORingredient3 IN ( 'fettuccinne', 'fettuchini', 'fettucini', 'fettucinne', 'fetuchine', 'fetuchinney',

'fetuchinni', 'fetucine', 'fetucini', 'fetucinni') ORingredient4 IN ( 'fettuccinne', 'fettuchini', 'fettucini', 'fettucinne', 'fetuchine', 'fetuchinney',

'fetuchinni', 'fetucine', 'fetucini', 'fetucinni') ORingredient5 IN ( 'fettuccinne', 'fettuchini', 'fettucini', 'fettucinne', 'fetuchine', 'fetuchinney',

'fetuchinni', 'fetucine', 'fetucini', 'fetucinni') ORingredient6 IN ( 'fettuccinne', 'fettuchini', 'fettucini', 'fettucinne', 'fetuchine', 'fetuchinney',

'fetuchinni', 'fetucine', 'fetucini', 'fetucinni') ORingredient7 IN ( 'fettuccinne', 'fettuchini', 'fettucini', 'fettucinne', 'fetuchine', 'fetuchinney',

'fetuchinni', 'fetucine', 'fetucini', 'fetucinni') ORingredient8 IN ( 'fettuccinne', 'fettuchini', 'fettucini', 'fettucinne', 'fetuchine', 'fetuchinney',

'fetuchinni', 'fetucine', 'fetucini', 'fetucinni');

Page 23: How to Fake a Database Design

April 7, 2023 Copyright 2014, http://www.allaroundtheworld.fr/

It’s “fettuccine”, in caseyou were wondering

Page 24: How to Fake a Database Design

April 7, 2023 Copyright 2014, http://www.allaroundtheworld.fr/

Searching

SELECT recipe_id, name FROM recipes WHERE ingredient1 = 'fettuccine' OR ingredient2 = 'fettuccine' OR ingredient3 = 'fettuccine' OR ingredient4 = 'fettuccine' OR ingredient5 = 'fettuccine' OR ingredient6 = 'fettuccine' OR ingredient7 = 'fettuccine' OR ingredient8 = 'fettuccine';

Page 25: How to Fake a Database Design

April 7, 2023 Copyright 2014, http://www.allaroundtheworld.fr/

Ingredients Table

Page 26: How to Fake a Database Design

April 7, 2023 Copyright 2014, http://www.allaroundtheworld.fr/

Rule #3

1. Nouns == tables2. Another table’s ID must have a FK constraint3. Lists of things get their own table

Page 27: How to Fake a Database Design

April 7, 2023 Copyright 2014, http://www.allaroundtheworld.fr/

Lookup Table

Many-to-many relationship

Page 28: How to Fake a Database Design

April 7, 2023 Copyright 2014, http://www.allaroundtheworld.fr/

Searching

SELECT recipe_id, name FROM recipes r JOIN recipe_ingredients ri ON ri.recipe_id = r.recipe_id JOIN ingredients i ON i.ingredient_id =

ri.ingredient_id WHERE i.name = 'fettuccine';

Page 29: How to Fake a Database Design

April 7, 2023 Copyright 2014, http://www.allaroundtheworld.fr/

Our DDL (Data Definition Language)

CREATE TABLE recipes_ingredients ( recipe_ingredient_id SERIAL PRIMARY KEY, recipe_id INTEGER NOT NULL, ingredient_id INTEGER NOT NULL, UNIQUE(recipe_id, ingredient_id), FOREIGN KEY (recipe_id) REFERENCES recipes(recipe_id), FOREIGN KEY (ingredient_id) REFERENCES ingredients(ingredient_id));

Page 30: How to Fake a Database Design

April 7, 2023 Copyright 2014, http://www.allaroundtheworld.fr/

Our DDL (Data Definition Language)

CREATE TABLE recipes_ingredients ( recipe_id INTEGER NOT NULL, ingredient_id INTEGER NOT NULL, PRIMARY KEY (recipe_id, ingredient_id), FOREIGN KEY (recipe_id) REFERENCES recipes(recipe_id), FOREIGN KEY (ingredient_id) REFERENCES recipes(ingredient_id));

Page 31: How to Fake a Database Design

April 7, 2023 Copyright 2014, http://www.allaroundtheworld.fr/

Rule #4

1. Nouns == tables2. Another table’s ID must have a FK constraint3. Lists of things get their own table4. Many-to-many == lookup table (with FKs)

Page 32: How to Fake a Database Design

April 7, 2023 Copyright 2014, http://www.allaroundtheworld.fr/

So How Do We Order Recipes?

Page 33: How to Fake a Database Design

April 7, 2023 Copyright 2014, http://www.allaroundtheworld.fr/

Orders With Recipes

Page 34: How to Fake a Database Design

April 7, 2023 Copyright 2014, http://www.allaroundtheworld.fr/

How Many of Which Ingredient?

Page 35: How to Fake a Database Design

April 7, 2023 Copyright 2014, http://www.allaroundtheworld.fr/

Our simple “customers”, “orders”, and “recipes”database has grown to seven tables.

And it will keep growing.

Page 36: How to Fake a Database Design

April 7, 2023 Copyright 2014, http://www.allaroundtheworld.fr/

So Far

• Every noun has its own table (*)• Lookup tables join related tables• And generally have some of unique constraint• Other table’s ids have foreign key constraints

Page 37: How to Fake a Database Design

April 7, 2023 Copyright 2014, http://www.allaroundtheworld.fr/

Database Tips

• We’ve covered the main rules• They only cover structure• Now to dive deeper

Page 38: How to Fake a Database Design

April 7, 2023 Copyright 2014, http://www.allaroundtheworld.fr/

Equality ≠ Identity

• No duplication == not duplicating identity• Are identical twins the same person?• Are two guys named “John” the same guy?• This is important and easy to get wrong• For example …

Page 39: How to Fake a Database Design

April 7, 2023 Copyright 2014, http://www.allaroundtheworld.fr/

How do you get the total of an order?

• Assume each recipe has a price• Store total in the order? (hint: no)• Store price on the recipe? (hint: yes)• Is that enough?

Page 40: How to Fake a Database Design

April 7, 2023 Copyright 2014, http://www.allaroundtheworld.fr/

Orders Total

Page 41: How to Fake a Database Design

April 7, 2023 Copyright 2014, http://www.allaroundtheworld.fr/

Calculating the Order Total? SELECT o.order_id, sum(i.price) FROM orders o JOIN orders_recipes orr ON orr.order_id = o.order_id JOIN recipes r ON r.recipe_id = orr.recipe_idGROUP BY o.order_id

Page 42: How to Fake a Database Design

April 7, 2023 Copyright 2014, http://www.allaroundtheworld.fr/

What if the price changes?

Page 43: How to Fake a Database Design

April 7, 2023 Copyright 2014, http://www.allaroundtheworld.fr/

Orders Total

Page 44: How to Fake a Database Design

April 7, 2023 Copyright 2014, http://www.allaroundtheworld.fr/

Calculating the Order Total SELECT o.order_id, sum(orr.price) FROM orders o JOIN orders_recipes orr ON orr.order_id = o.order_idGROUP BY o.order_id

Page 45: How to Fake a Database Design

April 7, 2023 Copyright 2014, http://www.allaroundtheworld.fr/

Equality is not Identity

• Order item price isn’t item price• What if the item price changes?• What if you give a discount on the order item?• A subtle, but common bug

Page 46: How to Fake a Database Design

April 7, 2023 Copyright 2014, http://www.allaroundtheworld.fr/

Rule #5

1. Nouns == tables2. Another table’s ID must have a FK constraint3. Lists of things get their own table4. Many-to-many == lookup table (with FKs)5. Watch for equal values that aren’t identical

Page 47: How to Fake a Database Design

April 7, 2023 Copyright 2014, http://www.allaroundtheworld.fr/

Naming

• Names are important• Identical columns should have identical names• Names should hint at use

Page 48: How to Fake a Database Design

April 7, 2023 Copyright 2014, http://www.allaroundtheworld.fr/

Bad Naming

SELECT name, 'too cold' FROM areas WHERE temperature < 32;

Page 49: How to Fake a Database Design

April 7, 2023 Copyright 2014, http://www.allaroundtheworld.fr/

ID Names

orders.order_idversus

orders.id

Page 50: How to Fake a Database Design

April 7, 2023 Copyright 2014, http://www.allaroundtheworld.fr/

ID Names

SELECT o.id, sum(i.price) FROM orders o JOIN orders_recipes orr ON orr.order_id = o.id JOIN recipes r on r.id = o.idGROUP BY o.order_id

Page 51: How to Fake a Database Design

April 7, 2023 Copyright 2014, http://www.allaroundtheworld.fr/

ID Names

SELECT o.id, sum(i.price) FROM orders o JOIN orders_recipes orr ON orr.order_id = o.id JOIN recipes r on r.id = o.idGROUP BY o.order_id

Page 52: How to Fake a Database Design

April 7, 2023 Copyright 2014, http://www.allaroundtheworld.fr/

Conceptually Similar to …

SELECT name FROM customer WHERE id > weight;

Page 53: How to Fake a Database Design

April 7, 2023 Copyright 2014, http://www.allaroundtheworld.fr/

ID Names SELECT thread.* FROM email thread JOIN email selected ON selected.id = thread.id JOIN character recipient ON recipient.id = thread.recipient_id JOIN station_area sa ON sa.id = recipient.id JOIN station st ON st.id = sa.id JOIN star origin ON origin.id = thread.id JOIN star destination ON destination.id = st.idLEFT JOIN route ON ( route.from_id = origin.id AND route.to_id = destination.id ) WHERE selected.id = ? AND ( thread.sender_id = ? OR ( thread.recipient_id = ? AND ( origin.id = destination.id OR ( route.distance IS NOT NULL AND now() >= thread.datesent + ( route.distance * interval '30 seconds' ) ))))ORDER BY datesent ASC, thread.parent_id ASC NULLS FIRST

Page 54: How to Fake a Database Design

April 7, 2023 Copyright 2014, http://www.allaroundtheworld.fr/

Rule #6

1. Nouns == tables2. Another table’s ID must have a FK constraint3. Lists of things get their own table4. Many-to-many == lookup table (with FKs)5. Watch for equal values that aren’t identical6. Name columns as descriptively as possible

Page 55: How to Fake a Database Design

April 7, 2023 Copyright 2014, http://www.allaroundtheworld.fr/

Summary

• Nouns == tables (*)• FK constraints• Proper naming is important• Your DBAs will thank you• Your apps will be more robust

Page 56: How to Fake a Database Design

April 7, 2023 Copyright 2014, http://www.allaroundtheworld.fr/

?http://www.slideshare.net/ovid/

Page 57: How to Fake a Database Design

April 7, 2023 Copyright 2014, http://www.allaroundtheworld.fr/

Bonus Slides!

Super-duper important stuff I wasn’t sure I had time to cover because it’s

going to make your head hurt.

Page 58: How to Fake a Database Design

April 7, 2023 Copyright 2014, http://www.allaroundtheworld.fr/

Avoid NULL Values

• Every column should have a type• NULLs, by definition, are unknown values• Thus, their type is unknown• But … every column should have a type?

Page 59: How to Fake a Database Design

April 7, 2023 Copyright 2014, http://www.allaroundtheworld.fr/

Our employees TableCREATE TABLE employees ( employee_id SERIAL PRIMARY KEY, name CHARACTER VARYING(255) NOT NULL, salary MONEY NULL);

Page 60: How to Fake a Database Design

April 7, 2023 Copyright 2014, http://www.allaroundtheworld.fr/

Giving Bonuses

• $1,000 bonus to all employees• … if they make less than $40,000/year

Page 61: How to Fake a Database Design

April 7, 2023 Copyright 2014, http://www.allaroundtheworld.fr/

Get Employees For Bonus

SELECT employee_id, name FROM employee WHERE salary < 40000;

Page 62: How to Fake a Database Design

April 7, 2023 Copyright 2014, http://www.allaroundtheworld.fr/

Bad SQL

• Won’t return anyone with a NULL salary• Why is the salary NULL?– What if it’s confidential?– What if they’re a contractor and in that table?– What if they’re an unpaid slave intern?– What if it’s unknown when the data was entered?

Page 63: How to Fake a Database Design

April 7, 2023 Copyright 2014, http://www.allaroundtheworld.fr/

NULLs tell you nothing

supplier_id city

s1 ‘London’

part_id cityp1 NULL

suppliers table

parts table

Example via “Database In Depth” by C.J. Date

Page 64: How to Fake a Database Design

April 7, 2023 Copyright 2014, http://www.allaroundtheworld.fr/

NULLs tell you nothing

part_id cityp1 NULL

parts table

Example via “Database In Depth” by C.J. Date

SELECT part_id FROM parts;

SELECT part_id FROM parts WHERE city = city;

Page 65: How to Fake a Database Design

April 7, 2023 Copyright 2014, http://www.allaroundtheworld.fr/

NULLs tell you nothing

supplier_id city

s1 ‘London’

part_id cityp1 NULL

Example via “Database In Depth” by C.J. Date

SELECT s.supplier_id, p.part_idFROM suppliers s, parts pWHERE p.city <> s.city -- can’t compare NULL OR p.city <> 'Paris’; -- can’t compare NULL

Page 66: How to Fake a Database Design

April 7, 2023 Copyright 2014, http://www.allaroundtheworld.fr/

NULLs tell you lies

Example via “Database In Depth” by C.J. Date

SELECT s.supplier_id, p.part_idFROM suppliers s, parts pWHERE p.city <> s.city -- can’t compare NULL OR p.city <> 'Paris’; -- can’t compare NULL

• We get no rows because we can’t compare a NULL city• The unknown city is Paris or it isn't.• If it’s Paris, the first condition is true• If it’s not Paris, the second condition is true• Thus, the WHERE clause must be true, but it’s not

Page 67: How to Fake a Database Design

April 7, 2023 Copyright 2014, http://www.allaroundtheworld.fr/

Rule #7

1. Nouns == tables2. Another table’s ID must have an FK constraint3. Lists of things get their own table4. Many-to-many == lookup table (with FKs)5. Watch for equal values that aren’t identical6. Name columns as descriptively as possible7. Avoid NULL columns like the plague