Modeling the Hierarchical Nature of Data - PostgreSQL: The world's

42
Modeling the Hierarchical Nature of Data An Entity Relationship Approach

Transcript of Modeling the Hierarchical Nature of Data - PostgreSQL: The world's

Page 1: Modeling the Hierarchical Nature of Data - PostgreSQL: The world's

Modeling the HierarchicalNature of Data

An Entity Relationship Approach

Page 2: Modeling the Hierarchical Nature of Data - PostgreSQL: The world's

Introductions

Richard Broersma LAPUG Leader & PostgreSQL Enthusiast Systems Integrator / Industrial Applications Developer

Mangan Incorporated

Mangan is a nationwide engineering and automation firm that serves multiple industries:

Petrochemical, Oil & Nat. Gas Pipelines, Oil & Nat. Gas Production Chemical Production Bio-Pharmaceutical Production Solar Energy Production

Page 3: Modeling the Hierarchical Nature of Data - PostgreSQL: The world's

Preview1) Controversy

Hierarchical data in Relational Databases

2) Perceptions Reality and Data modeling

3) Conceptual Designs Entity Relationship Diagrams (ERD) Modeling the Concept of Hierarchal Data

4) Physical Designs Implementations of conceptual ERD Designs

Page 4: Modeling the Hierarchical Nature of Data - PostgreSQL: The world's

Controversy

In RDBMS, R is for RRelational!

What's all this Hierarchal

Nonsense?

Page 5: Modeling the Hierarchical Nature of Data - PostgreSQL: The world's

Controversy

Network and Hierarchical database are ”things of the past.”

Relational databases should be implemented using entities and relationships described in relational theory.

Should Hierarchical modeling be avoided?

Page 6: Modeling the Hierarchical Nature of Data - PostgreSQL: The world's

Perceptions

Thinking in terms of datamodeling Entities Relationships Entity types

Implementation using 3 Normal forms

Employs

Companies

People

EmployeeTypes

( 0,1 )

( 1,N )

( 0,N )

Basics of ERD Modeling

Page 7: Modeling the Hierarchical Nature of Data - PostgreSQL: The world's

Perceptions of Reality

Our Perceptions Entities & Relationships

Classifications Taxonomy

How do the attributes of similar type of entities differ

Layers of Hierarchical Abstraction

Physical Hierarchical Representation

Page 8: Modeling the Hierarchical Nature of Data - PostgreSQL: The world's

Short-fall of Entity Types

Employee Types table Can't express attribute

similarities or differences of similar types.

Can't define relationships between people of related types

Employs EmployeeTypes

( 0,N )

Employee TypesElectrician Does Electrical WorkEngineer Does EngineeringManager Manages othersPresident Presides over a companyWelder Welds

EmploysABC TED WelderABC SANDY President

ACME RON ElectricianACME JILL Engineer

BP DAVE Manager

Page 9: Modeling the Hierarchical Nature of Data - PostgreSQL: The world's

Conclusion

If we need to know about the: Extended Attributes of Entity Types Relationships between Entity Types

Then hierarchical data modeling must be implemented

Page 10: Modeling the Hierarchical Nature of Data - PostgreSQL: The world's

Conceptual Designs ERD Provides

Generalization Hierarchy Model

Defines hierarchical constraints for hierarchical mapping.

Grouping of similar entity types. Similarities and differences are

defined. Relationships can be created

between entities of any (sub)type.

type

subtype A Subtype B Subtype C

Sub-subtype BSpe

cific

- G

ener

ic ( T,E )

( P,E )

Generalization Hierarchy

Page 11: Modeling the Hierarchical Nature of Data - PostgreSQL: The world's

ERD - Hierarchical Constraints C

1 Property

{T} Total Coverage {P} Partial Coverage

C2 Property

{E} Exclusive Coverage {O} Overlapping Coverage

( C1, C

2 ) Coverage Properties

Animals

Carnivores Herbivores

Page 12: Modeling the Hierarchical Nature of Data - PostgreSQL: The world's

ERD - Entity Type Groupings

Entity types having equal attributes are grouped together.

Similarities and differences are defined.

Animals

Carnivores Herbivores

( T, O )

Animalsid weight

Bear-1 bear 500Sheep-2 sheep 100Wolf-3 wolf 120Deer-4 deer 240Puma-5 puma 200

animalcode

Carnivoresid weight

Bear-1 bear 500 salmonWolf-3 wolf 120 sheep

Puma-5 puma 200 deer

animalcode favoritePreyHerbivores

id weightBear-1 bear 500 berries

Sheep-2 sheep 100 grassDeer-5 deer 240 grass

animalcode favoriteVegi

id weightBear-1 bear 500 salmon berries

Omnivores (implied by Overlapping)animalcode favoritePrey favoriteVegi

Page 13: Modeling the Hierarchical Nature of Data - PostgreSQL: The world's

ERD - Entity type Groupings

Beware of the “platypus”! Valid criticisms of G/H

exist. Some entities will map

to most of the hierarchical attributes but not all.

Careful consideration required to minimize platypus affect.*

Animals

Avians Mammals

( T, O )

*If practical, G/H redesign can eliminate most “Platypuses”.

Reptiles

Newly ClassifiedMonotremata

Page 14: Modeling the Hierarchical Nature of Data - PostgreSQL: The world's

ERD – Entity type Relationships

Complex Relationships are possible between sub types

Animals

Carnivores Herbivores

( T, O )

Mauls

Fears

(0,n

)

(0,n)

(1,n)(1,n)

Maulingscarnivoreid animalid Mauling-date

Bear-1 Wolf-3 01/15/08Bear-1 Deer-4 07/12/07Wolf-3 Sheep-2 09/22/07

Fears

Deer-4 Wolf-3Deer-4 Puma-5Deer-4 Bear-1

Sheep-2 Wolf-3Sheep-2 Puma-5Bear-1 Bear-6

carnivoreid herbivoreid

Page 15: Modeling the Hierarchical Nature of Data - PostgreSQL: The world's

Physical Designs

There are 5 physical designs (that I know of) for implementing conceptual Generalization Hierarchies

Each physical design varies in the features that the conceptual Generalization Hierarchy Model defines

Page 16: Modeling the Hierarchical Nature of Data - PostgreSQL: The world's

Physical Designs

Entity-Attribute-Value Table (EAV) ~Relational purists favorite

Nullable Attributes Table (NA) ~Happens overtime

Vertical Disjunctive Table Partitioning (VDP) ~My favorite

Horizontal Disjunctive Table Partitioning (HDP)~PostgreSQL Table inheritance

(NA–EAV) Hybrid Table ~Worst Design Ever – know it to avoid it

Page 17: Modeling the Hierarchical Nature of Data - PostgreSQL: The world's

Implementation Features List

Model VS. Features EAV

NA VDP

HDP

EAV-

NA Legend

Total Coverage A E A A A E = Enforced (DDL or DRI)Partial Coverage A E E A A A = Allows (Client Enforced)

Exclusive Coverage A E E E A S = SupportsOverlapping Coverage A E A E A

Supports Grouping S SSupports Relations S S

Page 18: Modeling the Hierarchical Nature of Data - PostgreSQL: The world's

Good Design Guidelines Always include an entity type column associated

with the primary key throughout the hierarchy This is still the best way to identify the type of

hierarchical entity or hierarchical relationship CHECK Constraints can then be implemented based

on the entity types (Hierarchical Foreign Keys can be used also.)

This will prevent data corruption at the RDBMS tier that could be caused by application bugs or lazy users.

Page 19: Modeling the Hierarchical Nature of Data - PostgreSQL: The world's

Entity Attribute Value (EAV)

Physical Implementation:

Animals

Carnivores Herbivores

( T, O )

Animals

AnimalAttributes

have

AttributeTypesis a

( 0, n )

( 1, 1 )

( 1, 1 ) ( 0, n )

AnimalTypesis a

( 0, n )( 1, 1 )

Guard this table with your life!

Page 20: Modeling the Hierarchical Nature of Data - PostgreSQL: The world's

EAV Table - DDLCREATE TABLE Animaltypes( animalcode VARCHAR( 20 ) PRIMARY KEY, description TEXT NOT NULL DEFAULT '' );

CREATE TABLE Animals ( animal_id VARCHAR( 20 ) PRIMARY KEY, animalcode VARCHAR( 20 ) REFERENCES Animaltypes( animalcode ) ON UPDATE CASCADE, weight NUMERIC( 7, 2) CHECK( weight > 0 ));

CREATE TABLE Attributetypes( attributecode VARCHAR( 20 ) PRIMARY KEY, description TEXT NOT NULL DEFAULT '' );

CREATE TABLE Animalattributes( animal_id VARCHAR( 20 ) REFERENCES Animals( animal_id ) ON UPDATE CASCADE ON DELETE CASCADE, attribute VARCHAR( 20 ) REFERENCES Attributetypes( attributecode ) ON UPDATE CASCADE, att_value TEXT NOT NULL, PRIMARY KEY ( animal_id, attribute ));

Page 21: Modeling the Hierarchical Nature of Data - PostgreSQL: The world's

EAV Table - Consideration Advantages: Provides a flexible mechanism to record the attributes associated with any entity.

The flexibility eliminates the possibility of “platypuses”.

This EAV design requires almost no consideration of the nature of the applicable hierarchical data and requires very little time to implement ( cookie cutter).

Page 22: Modeling the Hierarchical Nature of Data - PostgreSQL: The world's

EAV Table - Consideration Disadvantages: Users or Application logic becomes responsible to ensuring that all entities of a

specific type will have the required associated attributes. (no DDL or DRI server constraints will work).

The EAV table uses a TEXT (or VARCHAR) column for all attribute values regardless if Dates, Timestamps, Integers, Numerics or Booleans would be more appropriate.

No Foreign Keys on Attribute Values: There isn't a way to prevent bad data-entry. For example nothing would prevent a user from entering 'I like peanut butter.' for the attribute value for Birth Date.

Page 23: Modeling the Hierarchical Nature of Data - PostgreSQL: The world's

Null-able Attributes (NA) Table

Physical Implementation:

Animals

Carnivores Herbivores

( T, O )

Animals AnimalTypesis a

( 0, n )( 1, 1 )

Page 24: Modeling the Hierarchical Nature of Data - PostgreSQL: The world's

(NA) Table - DDLCREATE TABLE Animaltypes( animalcode VARCHAR( 20 ) PRIMARY KEY, description TEXT NOT NULL DEFAULT '' );

CREATE TABLE Animals ( animal_id VARCHAR( 20 ) PRIMARY KEY, animalcode VARCHAR( 20 ) REFERENCES Animaltypes( animalcode ) ON UPDATE CASCADE, weight NUMERIC( 7, 2) CHECK( weight > 0 ),

favoriteprey VARCHAR( 20 ) REFERENCES Animaltypes( animalcode ) ON UPDATE CASCADE CHECK( CASE WHEN animalcode IN ('Bear', 'Wolf', 'Puma') THEN favoriteprey IS NOT NULL WHEN animalcode IN ('Deer', 'Sheep' ) THEN favoriteprey IS NULL END ) favoritevegi VARCHAR( 20 ) REFERENCES Vegitypes ( Vegicode ) ON UPDATE CASCADE CHECK( CASE WHEN animalcode IN ('Bear', 'Deer', 'Sheep') THEN favoritevegi IS NOT NULL WHEN animalcode IN ('Wolf', 'Pump' ) THEN favoritevegi IS NULL END ) );

Page 25: Modeling the Hierarchical Nature of Data - PostgreSQL: The world's

NA Table - Consideration Advantages: Provides a flexible mechanism to record the attributes associated with any entity.

All attributes values can be constrained with foreign keys.

Requires almost no consideration of the nature of the applicable hierarchical data. Hierarchical attributes are added via DDL as they are encounter during the life time of the application.

Page 26: Modeling the Hierarchical Nature of Data - PostgreSQL: The world's

NA Table - Consideration Disadvantages: Validating Hierarchical data integrity requires too many checks constraints. This

can really hurt INSERT and UPDATE performance

Tuples in the table can get to be too big with many-many unused nulled columns.

The concept of null can get obscured. Does Null mean “Don't Know” or “Doesn't Apply”.

Page 27: Modeling the Hierarchical Nature of Data - PostgreSQL: The world's

(VDP) Table

Physical Implementation:

Animals

Carnivores Herbivores

( T, O )

Animals

AnimalTypesis a

( 0, n )( 1, 1 )

Carnivores Herbivores

is ais a

is a

( 0, 1 )

( 1, 1 )( 1, 1 )

( 0, 1 )

Page 28: Modeling the Hierarchical Nature of Data - PostgreSQL: The world's

(VDP) Table - DDLCREATE TABLE Animals ( animal_id VARCHAR( 20 ) UNIQUE NOT NULL, animalcode VARCHAR( 20 ) REFERENCES Animaltypes( animalcode ) ON UPDATE CASCADE, weight NUMERIC( 7, 2) CHECK( weight > 0 ),

PRIMARY KEY ( animal_id, animalcode ) --RI to handle denormalization of sub-tables );CREATE TABLE Carnivores ( animal_id VARCHAR( 20 ) UNIQUE NOT NULL,

animalcode VARCHAR( 20 ) NOT NULL CHECK( animalcode IN ( 'Bear', 'Wolf', 'Puma' )),

favoriteprey VARCHAR( 20 ) REFERENCES Animaltypes( animalcode ) ON UPDATE CASCADE,

PRIMARY KEY ( animal_id, animalcode ), FOREIGN KEY ( animal_id, aminalcode ) REFERENCES Animals( animal_id, animalcode ) ON UPDATE CASCADE ON DELETE CASCADE, --RI to handle denormalization of animalcode );

Page 29: Modeling the Hierarchical Nature of Data - PostgreSQL: The world's

(VDP) Table - DDLCREATE TABLE Herbivores ( animal_id VARCHAR( 20 ) UNIQUE NOT NULL,

animalcode VARCHAR( 20 ) NOT NULL CHECK( animalcode IN ( 'Deer', 'Sheep', 'Bear' )),

favoriteprey VARCHAR( 20 ) REFERENCES Animaltypes( animalcode ) ON UPDATE CASCADE,

PRIMARY KEY ( animal_id, animalcode ), FOREIGN KEY ( animal_id, aminalcode ) REFERENCES Animals( animal_id, animalcode ) ON UPDATE CASCADE ON DELETE CASCADE,--RI to handle denormalization of animalcode );

Page 30: Modeling the Hierarchical Nature of Data - PostgreSQL: The world's

VDP Table - Consideration Advantages: All attributes values can be constrained with foreign keys.

Requires few Checks Constraints than the NA Table.

Page 31: Modeling the Hierarchical Nature of Data - PostgreSQL: The world's

VDP Table - Consideration Disadvantages: Checks only required for Entity type field, but too many check constraints can still

hurt INSERT performance

Additional Application logic required to handle multiple INSERTs and UPDATEs to various (sub)type tables

Requires some denormalization to enforce data integrity. Referential Integrity handles this problem.

This design requires the designer to be well versed in the domain that is being modeled

Page 32: Modeling the Hierarchical Nature of Data - PostgreSQL: The world's

(HDP) Table

Physical Implementation:

Animals

Carnivores Herbivores

( T, O )

AnimalTypesis a

( 0, n )( 1, 1 )

Omnivores

is aAnimals

Carnivores Herbivores

( T, O )

Page 33: Modeling the Hierarchical Nature of Data - PostgreSQL: The world's

(HDP) Table - DDLCREATE TABLE Animals ( animal_id VARCHAR( 20 ) PRIMARY KEY, animalcode VARCHAR( 20 ) REFERENCES Animaltypes( animalcode ) ON UPDATE CASCADE, weight NUMERIC( 7, 2) CHECK( weight > 0 ));

CREATE TABLE Carnivores ( favoriteprey VARCHAR( 20 ))INHERITS( Animals );

ALTER TABLE CarnivoresADD CONSTRAINT Cornivores_animalcode_check_iscarnivore CHECK( animalcode IN ( 'Bear', 'Wolf', 'Puma' ));

Page 34: Modeling the Hierarchical Nature of Data - PostgreSQL: The world's

(HDP) Table - DDLCREATE TABLE Herbivores ( favoritevegi VARCHAR( 20 ))INHERITS( Animals );

ALTER TABLE HerbivoresADD CONSTRAINT Herbivores_animalcode_check_isHerbivore CHECK( animalcode IN ( 'Bear', 'Deer', 'Sheep' ));

CREATE TABLE Omnivores ()INHERITS( Carnivores, Herbivores ); -- PostgreSQL also inherits Check Constraint -- The Overlapping checks will algebraically -- reduce to CHECK( animalcode = 'Bear' ) -- CarnivoreCodes ∩ HerbivoreCodes = OmnivoreCodes

Page 35: Modeling the Hierarchical Nature of Data - PostgreSQL: The world's

HDP Table - Consideration Advantages: All attributes values can be constrained with foreign keys. But you have to re-

implement these Inherited foreign keys yourself.

Possible to allow for relationships only between hierarchical leaf entitles.

The application logic is simplified since all accesses to sub-entities are to a single table.

Page 36: Modeling the Hierarchical Nature of Data - PostgreSQL: The world's

HDP Table - Consideration Disadvantages: SLOW Sequential Scans are the only way to search the Root or Branch nodes of

the hierarchy since scans on these tables are based on UNION ALL queries.

Uniqueness cannot be enforced across the hierarchy.

This design requires the designer to be well versed in the domain that is being modeled

Page 37: Modeling the Hierarchical Nature of Data - PostgreSQL: The world's

(NA – EAV) Hybrid Table

Physical Implementation:

Animals

Carnivores Herbivores

( T, O )

Animals AnimalTypesis a

( 0, n )( 1, 1 )

Page 38: Modeling the Hierarchical Nature of Data - PostgreSQL: The world's

(NA – EAV) Table - DDLCREATE TABLE Animaltypes( animalcode VARCHAR( 20 ) PRIMARY KEY, description TEXT NOT NULL DEFAULT '' );

CREATE TABLE Animals ( animal_id VARCHAR( 20 ) PRIMARY KEY, animalcode VARCHAR( 20 ) REFERENCES Animaltypes( animalcode ) ON UPDATE CASCADE, column1 VARCHAR( 255 ), --The application maps the attributes of each column2 VARCHAR( 255 ), --entity type to these intentionally vague column3 VARCHAR( 255 ), --columns. Each entity type will have a unique column4 VARCHAR( 255 ), --mapping for column1 thru column100. column5 VARCHAR( 255 ), column6 VARCHAR( 255 ), --Unmapped columns not needed by an entity type column7 VARCHAR( 255 ), --may be treated as custom fields that the users column8 VARCHAR( 255 ), --may use any way they see fit. -- ... column100 VARCHAR( 255 ));

Page 39: Modeling the Hierarchical Nature of Data - PostgreSQL: The world's

NA – EAV Table - Consideration Advantages: Provides a flexible mechanism to record the attributes associated with any entity.

The flexible mechanism eliminates the possibility of “platypuses”.

Page 40: Modeling the Hierarchical Nature of Data - PostgreSQL: The world's

NA – EAV Table - Consideration Disadvantages: These VARCHAR columns have no meaning. Each entity can map a column for a completely

unrelated attribute.

The Application mapping becomes a major source of data corruption bugs if mapping isn't cleanly implemented or if entity type changes are required overtime.

If unmapped columns are exposed to the users as custom column, there is not way to ensure that various users will be consistent when implementing these columns.

Users or Application logic becomes responsible to ensuring that all entities of a specific type will have the required associated attributes. (no DDL server constraints will work)

The NAEAV table uses a VARCHAR column for all attribute values regardless if Dates, Timestamps, Integers, Numerics or Booleans would be more appropriate

No Foreign Keys on Attribute Columns: The isn't a way to prevent bad data-entry. For example nothing would prevent a user from entering 'I like peanut butter.' for the attribute value for Birthday

Table design concept is badly de-normalized.

Page 41: Modeling the Hierarchical Nature of Data - PostgreSQL: The world's

Bibliography

Works Cited

Batini, Carol, Stefano Ceri, and Shamkant B. Navathe. Conceptual Database Design : An Entity-Relationship Approach.

Boston: Benjamin Cummings Company, 1991.

Celko, Joe. Joe Celko's SQL for Smarties : Advanced SQL Programming. 3rd ed. Greensboro: Morgan Kaufmann, 2005.

Celko, Joe. Joe Celko's Thinking in Sets : Auxiliary, Temporal, and Virtual Tables in SQL. New York: Elsevier Science &

Technology Books, 2008.

Celko, Joe. Joe Celko's Trees and Hierarchies in SQL for Smarties. Greensboro: Morgan Kaufmann, 2004.

Douglas, Korry. PostgreSQL. 2nd ed. Indianapolis: Sams, 2005.

Page 42: Modeling the Hierarchical Nature of Data - PostgreSQL: The world's

Questions?

I have a question!What's all this Hierarchal

Nonsense?