Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher...

74
Introduction to Introduction to Databases: Databases: Relational and XML Relational and XML Models and Languages Models and Languages Instructors: Bertram Ludaescher Kai Lin

Transcript of Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher...

Page 1: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

Introduction to Databases:Introduction to Databases:Relational and XML Models Relational and XML Models and Languagesand Languages

Introduction to Databases:Introduction to Databases:Relational and XML Models Relational and XML Models and Languagesand Languages

Instructors:

Bertram LudaescherKai Lin

Instructors:

Bertram LudaescherKai Lin

Page 2: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

2Introduction to Databases, B. Ludaescher & K. Lin

Overview

• 09:15-10:20 Relational Databases (1h05’)• 10:20-10:30 BREAK (10’)• 10:30-11:50 Relational Databases (1h20’)• 11:50-13:15 LUNCH (1h25’)• 13:15-13:45 Demo & Hands-on (30’)• 13:45-15:10 XML: Basics (1h25’)• 15:10-15:30 BREAK (20’)• 15:30-16:30 XML: Querying (1h)• 16:30-17:00 Demo & Hands-on (30’)

Page 3: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

3Introduction to Databases, B. Ludaescher & K. Lin

Scope• Today: Introduction to Databases, in

particular– Relational database model– Relational Operations and Queries– Constraints– XML “data” model– Querying and transforming XML– Some demos & simple hands-on exercise

• Tomorrow: – Introduction to Knowledge Representation and

Ontologies

• But first: … déjà vu …

Page 4: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

4Introduction to Databases, B. Ludaescher & K. Lin

Page 5: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

5Introduction to Databases, B. Ludaescher & K. Lin

What is a Database System?

• Database (system) = – Database Instance (set of tables of rows)– Database Management System (DBMS)

• Origins in the commercial world: – to organize, query, and manipulate data

more effectively, efficiently, and independently

• Scientific databases– often special features:

• spatial, temporal, spatiotemporal, GIS, units, uncertainty, raw & derived data, …

Page 6: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

6Introduction to Databases, B. Ludaescher & K. Lin

Why not just use files as “databases”?• For some applications: yeah… why not?• But in general:

– scanning & ’grep’ing large files can be very inefficient– no language support for selecting desired data, joining

them, etc.• cannot express the kinds of questions/queries you’d like to ask• ‘grep’ is no substitute for a query language

– redundant and/or inconsistent storage of data– no transaction management and concurrency control

among multiple users– no security– no recovery– no data independence (application data)– no data modeling support– …

Page 7: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

7Introduction to Databases, B. Ludaescher & K. Lin

Features of a Database System• A data model (relational, object-

oriented, XML) prescribes how data can be organized:– as relations (tables) of tuples (rows)– as classes of (linked) objects– as XML trees

• A (database) schema (stored in the “data dictionary”) defines the structure of a specific database instance:– Relational schema– OO schema– XML Schema (or XML DTD)

Page 8: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

8Introduction to Databases, B. Ludaescher & K. Lin

Features of a Database System• Data is treated uniformly and separately from the

application• Efficient data access • Queries and views are expressed over the schema• Integrity constraints (checking and enforcement)• Transactions combine sets of operations into

logical units (all-or-nothing)• Synchronization of concurrent user transactions• Recovery (after system crash)

– not to be confused w/ backup– instead: guarantee consistency by “roll-back” of partially

executed transactions (how? Hint: logging)

• …

Page 9: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

9Introduction to Databases, B. Ludaescher & K. Lin

DB features, e.g., Concurrency Control

• Concurrent execution of simultaneous requests – long before web servers where around... – transaction management guarantees consistency

despite concurrent/interleaved execution• Transaction (= sequence of read/write operations)

– Atomicity: a transaction is executed completely or not at all

– Consistency: a transaction creates a new consistent DB state, i.e., in which all integrity constraints are maintained

– Isolation: to the user, a transaction seems to run in isolation

– Durability: the effect of a successful (“committed”) transaction remains even after system failure

Page 10: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

10Introduction to Databases, B. Ludaescher & K. Lin

Levels of Abstraction: Architecture Overview

DBinstances

Physical level

Logical (“conceptual”) level

View 1 View 2 View nConceptual…

Level

Index structures

Tables

Export schemas

ER-Model(Entity-Relationship) OO Models (Classes…)

part of DB design conceptual design

… often lost in the process…

logical dataindependence

physical data independence

User

Page 11: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

11Introduction to Databases, B. Ludaescher & K. Lin

Database Design: Entity-Relationship (ER) Model

• Entities:• Relationships:• Attributes:• ER Model:

– initial, high-level DB design (conceptual model)– easy to map to a relational schema (database tables)– comes with more constraints (cardinalities, aggregation) and

extensions: EER (is-a => class hierarchies)– related: UML (Unified Modeling Language) class diagrams

Employee Departmentworks-for

Name Salary ManagerName

since

Page 12: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

12Introduction to Databases, B. Ludaescher & K. Lin

The Relational Model

• Relation/Table Name:– employee, dept

• Attributes = Column Names:– Emp, Salary, DeptNo, Name, Mgr

• Relational Schema:– employee(Emp:string,

Salary:integer, DeptNo:integer), ...

• Tuple = Row of the table:– (“tom”, “60000”, “1”)

• Relation = Set of tuples:– {(...), (...), ...}

Emp Salary DNotom 60k 1tim 57k 1sally 45k 3carol 30k 1carol 35k 2….

Employee

DNo Name Mgr 1 Toys carol2 Comp. carol3 Shoes sam

Department

FK: foreign key, pointing to another key

Page 13: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

13Introduction to Databases, B. Ludaescher & K. Lin

Ex: Creating a Relational Database in SQL

CREATE TABLE employee (ssn CHAR(11),name VARCHAR(30),deptNo INTEGER,PRIMARY KEY (ssn),

FOREIGN KEY (deptNo) REFERENCES department )

CREATE TABLE department (deptNo INTEGER,name VARCHAR(20),manager CHAR(11),PRIMARY KEY (deptNo),

FOREIGN KEY (manager) REFERENCES employee(ssn) )

Page 14: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

14Introduction to Databases, B. Ludaescher & K. Lin

What is a Query?• Intuitively:

– An “executable question” in terms of a database schema

– Evaluating a query Q against a database instance D yields a set of answer objects:

• Relational tuples or XML elements

• Example:– Who are the employees in the ‘Toys’ dept.?– Who is (are) the manager(s) of ‘Tom’?– Show all pairs (Employee, Mgr)

• Technically: – A mapping from an input schema (the given table

schemas) to a result schema (the new columns you are interested in) defined in some query language

Page 15: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

15Introduction to Databases, B. Ludaescher & K. Lin

Why (Declarative) Query Languages?

• Things we talk and think about in PLs and QLs … – Assembly languages:

• registers, memory locations, jumps, ...

– C and the likes: • if-then-else, for, while, memory (de-)allocation,

pointers, ...

– Object-oriented languages:• C++: C plus objects, methods, classes, ...• Java: objects, methods, classes, references, ... • Smalltalk: objects, objects, objects, ... • OQL: object-query language

,,Die Grenzen meiner Sprache bedeuten die Grenzen meiner Welt.”“The limits of my language mean the limits of my world.”

Ludwig Wittgenstein, Tractatus Logico-Philosophicus

“If you have a hammer, everything looks like a nail.”

Page 16: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

16Introduction to Databases, B. Ludaescher & K. Lin

Why (Declarative) Query Languages?

• Things we talk and think about in PLs and QLs …– Functional languages (Haskell, ML):

• (higher-order) functions, fold(l|r), recursion, patterns, ...

=> Relational languages (SQL, Datalog)• relations (tables), tuples (rows); conceptual level: ER• relational operations: , , , , ..., ,,,,,..., , , |X|

=> Semistructured/XML (Tree) & Graph Query Languages

• trees, graphs, nodes, edges, children nodes, siblings, … • XPath, XQuery, …

• Also: – Focus on what, and not how!

Page 17: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

17Introduction to Databases, B. Ludaescher & K. Lin

Example: Querying a Relational Database

Emp SalaryDeptNoanne 62k 2john 60k 1

Employee

DeptNoMgr

1 anne2 anne

Department

SELECT e.Emp, d.MgrFROM Employee e, Department dWHERE e.DeptNo = d.DeptNo

Emp Mgrjohn anneanne anne

result

join

input tables

SQL query (or view def.)

answer (or view)

we don’t say how to evaluate this expression

Page 18: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

18Introduction to Databases, B. Ludaescher & K. Lin

Example Query: SQL vs DATALOG

• “List all employees and their managers”

• In SQL:SELECT e.name, d.managerFROM Employee e, Department dWHERE e.deptNo = d.deptNo

• In DATALOG:q(E, M) :- employee(E, S, D), department(D, N,

M).

a “join” operation

Page 19: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

19Introduction to Databases, B. Ludaescher & K. Lin

Important Relational Operations• select(R, Condition)

– filter rows of a table wrt. a condition

• project(R, Attr) – remove unwanted columns; keep rest

• join(R1, A2, R2, A2, Condition) – find “matches” in a “related” table – e.g. match R1.foreign key = R2.primary key

• cartesian product(R1, R2)• union (“OR”), intersection (“AND”)• set-difference (“NOT IN”)

Page 20: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

20Introduction to Databases, B. Ludaescher & K. Lin

Relational Operations (in DATALOG)

condition

Y1=Y2Y

independent

same

multiple rules union results

(query) output :– (query) input(query) output :– (query) input

Page 21: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

21Introduction to Databases, B. Ludaescher & K. Lin

Demo

Relational Queries in DATALOG

Page 22: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

22Introduction to Databases, B. Ludaescher & K. Lin

Queries, Views, Integrity Constraints• … can all be seen as “special queries”

• Query q(…) :- … ad-hoc queries

• View v(…) :- … exported views;

• Integrity Constraints– ic (…) :- …. MgrSal < EmpSal … – say what shouldn’t happen– if it does: alert the user (or refuse an update,

…)

Page 23: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

23Introduction to Databases, B. Ludaescher & K. Lin

Query Evaluation vs Reasoning• Query evaluation

– Given a database instance D and a query Q, run Q(D)– What databases do all the time

• Reasoning (aka “Semantic Query Optimization”)– Given a query Q and a constraint C, “optimize” Q&C

(e.g., given C, Q might be unsatisfiable)– Given Q1 and Q2 decide whether Q1 Q2– Given Q1,Q2, C decide whether Q1 Q2 | C

– Note: we are NOT given a database instance D here; just the schema and the query/IC expressions

Page 24: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

24Introduction to Databases, B. Ludaescher & K. Lin

Summary QLs for Relational Databases

Natural Hoin: same attribute name add condition that values must match

Page 25: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

25Introduction to Databases, B. Ludaescher & K. Lin

Relational Algebra

Page 26: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

26Introduction to Databases, B. Ludaescher & K. Lin

Relational Algebra

Page 27: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

27Introduction to Databases, B. Ludaescher & K. Lin

Relational Algebra

Page 28: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

28Introduction to Databases, B. Ludaescher & K. Lin

Relational Algebra

Page 29: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

29Introduction to Databases, B. Ludaescher & K. Lin

Relational Algebra

Page 30: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

30Introduction to Databases, B. Ludaescher & K. Lin

Hands-on Part

DBDesigner 4

Page 31: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

31Introduction to Databases, B. Ludaescher & K. Lin

DBDesigner 4 • An open source (GPL) database design tool 1. Goto http://www.fabforce.net/dbdesigner4/ 2. Download and install (5 min)3. Open the example schema Order (File -> Open -> Order), and

examine the relations between the tables forumtopic and forumpost. Find the foreign key used in the relation postHasTopic (5 min)

4. Connect to a sample MySQL database host: geon07.sdsc.edu port: 3306 database: summer_institute username: root password: [blank]

(5 min)5. Select all records in the table forumtopic (2 min)6. Select all records in the table forumpost, and sort the result

according to their idforumpost (3min)7. Find all forum posts with the topic Cars (5 min)8. Insert a record into the table forumpost (5 min)

Page 32: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

32Introduction to Databases, B. Ludaescher & K. Lin

Additional Material

(not presented)

Page 33: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

33Introduction to Databases, B. Ludaescher & K. Lin

Non-Relational Data Models• Relational model is “flat”: atomic data values

– nesting is modeled via “pointers” (foreign keys) and “Skolem-ids”

– extension: nested relational model (“tables within tables”, cf. nested HTML tables)

– values can be nested lists {...}, tuples (...), sets [...]– ISO standard(s): SQL– identity is value based

• Object-oriented data model:– complex (structured) objects with object-identity (oid)– class and type hierarchies (sub-/superclass, sub-/supertype)– OODB schema may be very close to “world model” (no

translation into tables)(+) queries fit your OO schema(-) (new) queries that don’t fit nicely

– ODMG standard, OQL (Object Query Language)

Page 34: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

34Introduction to Databases, B. Ludaescher & K. Lin

Example: Object Query Language (OQL)

• Q: what does this OQL query compute?• Note the use of path expressions like e.manager.children=> Semistructured/Graph Databases

SELECT DISTINCT STRUCT( E: e.name, C: e.manager.name, M: ( SELECT c.name FROM c IN e.children

WHERE FOR ALL d IN e.manager.children: c.age > d.age ) ) FROM e IN Employees;

Page 35: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

35Introduction to Databases, B. Ludaescher & K. Lin

A Graph Database

Page 36: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

36Introduction to Databases, B. Ludaescher & K. Lin

Querying Graphs with OO-Path Expressions

?- dblp."Inf. Systems".L.P, substr("Volume",L), P : person.spouse[lives_in = P.lives_in].

?- dblp."Inf. Systems".L."Michael E. Senko".Answer:

L="Volume 1, 1975”;L="Volume 5, 1980".

Page 37: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

37Introduction to Databases, B. Ludaescher & K. Lin

Constructs for Querying Graphs

Example: ?- dblp . any* . (if(vldb)| if(sigmod))

Page 38: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

38Introduction to Databases, B. Ludaescher & K. Lin

Keys, Keys, and more Keys• A key is a minimal set of attributes that:

– uniquely identify a tuple– determine every other attribute value in a tuple

• There may be many keys for a relation; we designate one as the primary key

• The phrase candidate key is used in place of “key” where “the key” denotes the primary key

• A superkey is a superset of a key (i.e., not necessarily minimal)

Page 39: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

39Introduction to Databases, B. Ludaescher & K. Lin

Example of “good” design

Employee(EName, SSN, BDate, Address, DNumber)Employee(EName, SSN, BDate, Address, DNumber)

Example of “bad” design (why is it bad?)

Emp(EName, SSN, BDate, Address, DNum, DName, DMgrSSN)Emp(EName, SSN, BDate, Address, DNum, DName, DMgrSSN)

Normalization of Relations

Department(DName, DNumber, DMgrSSN)Department(DName, DNumber, DMgrSSN)

Page 40: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

40Introduction to Databases, B. Ludaescher & K. Lin

The description of the department (DName, DMgrSSN) is repeated for every employee that works in that department.

Redundancy!

The department is described redundantly.This leads to update anomalies! (… and wastes space)

Digression (for experts only): • redundancy can be used to increase performance; e.g. “materialized views”)

What’s Wrong?

Emp(EName, SSN, BDate, Address, DNum, DName, DMgrSSN)Emp(EName, SSN, BDate, Address, DNum, DName, DMgrSSN)

Page 41: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

41Introduction to Databases, B. Ludaescher & K. Lin

Update Anomalies (caused by redundancy)

Insertion AnomaliesIf you insert an employee

Need to know which department he/she worksNeed to know the description information for that department

If you want to insert a department, you can’t … until there is at least one employee

Deletion Anomalies: if you delete an employee, is that dept. gone? was this the last employee in that dept?

* Modification Anomalies: if you change DName, for example, it needs to be changed everywhere!

Page 42: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

42Introduction to Databases, B. Ludaescher & K. Lin

Null values also cause problemsNull values might help in special cases, but are

not a general solution to update anomalies

For example, they may:– Waste space– Make it hard to specify and understand joins– Make it hard to aggregate (count, sum, etc.)– Have different meanings:

• Attribute does not apply to this tuple• Attribute value is unkown• Value is known but absent (not yet recorded)

– Causes problems with queries (can’t interpret query answers)

Page 43: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

43Introduction to Databases, B. Ludaescher & K. Lin

Why worry about normalization?

• To … • … reduce redundancy & update anomalies

• … reduce the need for null values

• Last not least: it’s a solved problem:•all algorithms & proofs have been worked out

Here: Normalization based on FDs (there’s more…)

Page 44: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

44Introduction to Databases, B. Ludaescher & K. Lin

Functional DependenciesStatement that if two tuples agree on attributes A

they must agree on attributes B

A B

If the value of the first attribute(s), A, is known, then the value of the second attribute(s), B, is known (read “A determines B”)

We want to know if it is always true in the application

Page 45: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

45Introduction to Databases, B. Ludaescher & K. Lin

Functional Dependencies

Examples of functional dependencies: social-security-number employee-name course-number course-title

Examples that are NOT functional dependencies

course-number - book course-number - car-color

Page 46: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

46Introduction to Databases, B. Ludaescher & K. Lin

Remember what it means to be a function:

x f(x) x g(x) x h(x)1 2 1 2 1 101 3 2 2 2 202 5 3 5 3 303 5

we are looking for functional relationships(that must occur in a relation) among attribute values

What is a functional dependency?

f is not a functionfor x=1, f(x) is not unique

Page 47: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

47Introduction to Databases, B. Ludaescher & K. Lin

EMP(ENAME, SSN, BDATE, ADDRESS, DNUM, DNAME, DMGRSSN)

EMP_PROJ(SSN, PNUM, HOURS, ENAME, PNAME, PLOCATION)

What are the FDs?

Page 48: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

48Introduction to Databases, B. Ludaescher & K. Lin

EMP(ENAME, SSN, BDATE, ADDRESS, DNUM, DNAME, DMGRSSN)

EMP_PROJ(SSN, PNUM, HOURS, ENAME, PNAME, PLOCATION)

12

3

4

5

6

7

8

9

10

What are the FDs?

Page 49: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

49Introduction to Databases, B. Ludaescher & K. Lin

EMPLOYEE (SSN, NAME, SALARY, JOB_DESC)

Why do we care?

If all FDs are “implied by the key”

it means that the DBMS only enforces keys (not FDs) and the DBMS is going to enforce the keys anyway

otherwise, we have to perform expensive operations to maintain consistency (code or check statements)

Page 50: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

50Introduction to Databases, B. Ludaescher & K. Lin

Decomposition

Main refinement technique: decomposition (replacing ABCD with, say, AB and BCD, or ACD and ABD) based on the projection operator.

Decomposition should be used judiciously:– Is there reason to decompose a relation?

(via Normal Forms)– What problems (if any) does the

decomposition cause? (lost information or dependencies?)

Page 51: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

51Introduction to Databases, B. Ludaescher & K. Lin

EMP(ENAME, SSN, BDATE, ADDRESS, DNUM, DNAME, DMGRSSN)

EMP_PROJ(SSN, PNUM, HOURS, ENAME, PNAME, PLOCATION)

EMP1(SSN, ENAME, BDATE, ADDRESS, DNUM)

X(DNUM, DNAME, DMGRSSN)

EMP2(SSN, ENAME)

X(PNUM, PNAME, PLOCATION)

Y(SSN, PNUM, HOURS)

12

3 4

5 6

7

8 9

10

How can we decompose using the Project Operator?

Page 52: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

52Introduction to Databases, B. Ludaescher & K. Lin

Correct DecompositionsHow do we know if a decomposition is correct?That we haven’t lost anything?

• We have three goals:

lossless-join decomposition (don’t throw any information away)(be able to reconstruct the original relation)

dependency preservationall of the FDs end up in just one relation (not split across two or more relations)

Boyce-Codd Normal Form (BCNF) - no redundancy

Page 53: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

53Introduction to Databases, B. Ludaescher & K. Lin

Lossless Decompositions• What is a lossless decomposition?• What is a lossy decomposition?

When R is decomposed into R1 and R2

Check to see if (R1 R2) = R

if it is a lossy decomposition, then R1 R2

gives you TOO MANY tuples.

Note: we are doing a natural join

Page 54: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

54Introduction to Databases, B. Ludaescher & K. Lin

Example: a lossy decomposition

Now join them using natural join: we get extra tuples!!!

1 smith p2 billing

Original: Employee SSN Name Project Ptitle

1 Smith p1 accounting 2 Jones p1 accounting 3 Smith p2 billing

Decomposition: Employee SSN Name Project PID Ptitle Name

1 Smith p1 accounting Smith 2 Jones p1 accounting Jones 3 Smith p2 billing Smith

Page 55: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

55Introduction to Databases, B. Ludaescher & K. Lin

Testing for a Lossless DecompositionFor decomposition into two relations

Let R1 and R2 form a decomposition of R.

R1 and R2 are both sets of attributes from R.

For the decomposition to be lossless ... The attributes in common must be a key for 1 of

the relations!

Note: You are joining a key and a foreign key

Page 56: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

56Introduction to Databases, B. Ludaescher & K. Lin

Example: test for a lossless decomposition

Employee(SSN, name, project, p-title)

decomposition: Employee (SSN, name) Project (project, p-title, name)

Which attribute is in common? Employee.Name and Project.Name

Is name a key for either of these two tables?NO! We have a problem.

Page 57: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

57Introduction to Databases, B. Ludaescher & K. Lin

Example: test for a lossless decomposition

Employee(SSN, name, project, p-title)

decomposition: Employee (SSN, name) Project (project, p-title)

Which attribute is in common? None

Is this decomposition lossless? NO! We have a problem.

Page 58: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

58Introduction to Databases, B. Ludaescher & K. Lin

Example: test for a lossless decomposition

Employee(SSN, name, project, p-title)

decomposition: Employee (SSN, name, project) Project (project, p-title)

Which attribute is in common? Employee.project and Project.project

Is project a key for either of these two tables?YES! We have a lossless decomposition.

Page 59: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

59Introduction to Databases, B. Ludaescher & K. Lin

Some Preliminary Normal Form Definitions

Prime Attribute - an attribute A is prime if it is a member of a key, otherwise it is nonprime

Partial Dependency - given an FD XY, Y is partially dependent on X if there is a proper subset X of X such that XY.

Transitive Dependency - A is transitively

dependent upon X if XY, Y-/X, and YA and AXY.

Page 60: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

60Introduction to Databases, B. Ludaescher & K. Lin

EMP(ENAME, SSN, BDATE, ADDRESS, DNUM, DNAME, DMGRSSN)

EMP_PROJ(SSN, PNUM, HOURS, ENAME, PNAME, PLOCATION)

12

3

4

5

6

7

8

9

10

What are the FDs?

Page 61: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

61Introduction to Databases, B. Ludaescher & K. Lin

Normal Forms Based on FDs

1NF - all attribute values (domain values) are atomic(part of the definition of the relational model)

2NF - 1NF + no key partial dependencies (every nonprime attribute fully depends on every key of R)

R(A B C D) B C (not allowed)

3NF - 2NF + no nonprime attribute is transitively dependent upon any key

R(A B C D) AB C, C D (not allowed)

BCNF - 3NF + no attribute is transitively dependent upon any key (all FDs determined by a key)

R(A B C D) AB C, C B (not allowed)

Page 62: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

62Introduction to Databases, B. Ludaescher & K. Lin

BCNF, Lossless, and Dependency-Preserving

(first choice)

3NF, Lossless and Dependency-Preserving

(second choice)

because sometimes you can’t preserve all dependencies

What’s the Goal?

Page 63: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

63Introduction to Databases, B. Ludaescher & K. Lin

Finding all of the FDsArmstrong’s Axioms

Reflexivity: If X Y, then X YTrivially, all attributes A A, e.g., name name and gender gender

Augmentation: If X Y, then XZ YZ for any ZTransitivity: If X Y and Y Z, then X Z

X, Y, and Z are sets of attributes in RArmstrong’s Axioms are a sound & complete set of inference rules

Union: If X Y and X Z, then X YZDecomposition: If X YZ, then X Y and X Z

Page 64: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

64Introduction to Databases, B. Ludaescher & K. Lin

Finding all FDs: the closure of a set of FDs

The closure of a set of FDs:

Let F be a set of FDs and F+ the closure of F.

F+ is the set of all FDs implied (or derivable) from F using Armstrong’s Axioms

F+ is computed by applying the inference rules until no new FDs can be found

Page 65: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

65Introduction to Databases, B. Ludaescher & K. Lin

What is “Dependency Preserving”

Suppose F is the original set of FDs and G is the set of FDs after decomposition

If we compute F+ and G+, and F+ = G+ then the decomposition is dependency preserving

Page 66: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

66Introduction to Databases, B. Ludaescher & K. Lin

Other Results (textbook)

Algorithms To:– Compute F+

– Compute the Minimal Cover for F– Find a dependency-preserving

decomposition into 3NF– Find a lossless-join decomposition into

BCNF– Find a lossless-join & dependency

preserving decomposition into 3NF

Page 67: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

67Introduction to Databases, B. Ludaescher & K. Lin

addr(number, street, city, state, zip)

number, street, city, state zipzip state

If we decompose, this FD won’t occur within one relation. Thus, since the DBMS only enforces keys (and not FDs directly), this can’t be enforced.

(If we leave it alone, it is in 3NF)

Example: Not dependency preserving

Page 68: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

68Introduction to Databases, B. Ludaescher & K. Lin

Lossless join decomposition algorithm1. Set D = {R} (the current set of relations)

2. While there is a relation in D that’s not in BCNFChoose a relation scheme Q that is not in BCNFFind a FD X Y in Q that violates BCNFReplace Q in D by (Q – Y) and (X Y)

End While;

3. Identify dependences that are not preserved (X A).

4. Add XA as a table to the set D

Page 69: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

69Introduction to Databases, B. Ludaescher & K. Lin

Tuning the Conceptual Schema

The choice of conceptual schema should be guided by the workload, in addition to redundancy issues:

– We may settle for a 3NF schema rather than BCNF.

– E.g., the workload may influence our choice to decompose a relation into 3NF or BCNF.

– We may further decompose a BCNF schema!

– We might denormalize (i.e., undo a decomposition step), or add fields to a relation. We must take care to avoid the problems caused by the redundancy!

Page 70: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

70Introduction to Databases, B. Ludaescher & K. Lin

Decomposition of one table into several

A relation is replaced by a collection of relations that are projections. Most important case. (Vertical partitioning)

• Sometimes, we might want to replace a relation by a collection of relations that are selections. (Horizontal partitioning)

– Each new relation has the same schema as the original, but a subset of the rows.

– Collectively, new relations contain all rows of the original. Typically, the new relations are disjoint.

Why might we do this?

Page 71: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

71Introduction to Databases, B. Ludaescher & K. Lin

Tuning the Conceptual Schema

Vertical decomposition using the project operator

Coursec# cname instructor room days

Course2c# cname

Course1c# room days

Course3c# instructor

Page 72: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

72Introduction to Databases, B. Ludaescher & K. Lin

Tuning the Conceptual Schema

Horizontal partition/decomposition using the select operator

Coursec# cname instructor room days

Undergraduate-Coursec# cname instructor room days

Graduate-Coursec# cname instructor room days

Page 73: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

73Introduction to Databases, B. Ludaescher & K. Lin

Tuning the Conceptual Schema

Denormalizing……always introduces redundancy!Course-Offering (offering#, quarter, c#, instructor, time, days)

Course-Offering (offering#, quarter, c#, cname, instructor, room, time, days)

This introduces redundancy - which must be managed.

Course (c#, cname) Instructor-Room (instructor, room)

Page 74: Introduction to Databases: Relational and XML Models and Languages Instructors: Bertram Ludaescher Kai Lin Instructors: Bertram Ludaescher Kai Lin.

74Introduction to Databases, B. Ludaescher & K. Lin

Denormalization interferes with dependency preservation

Course-Offering (offering#, quarter, c#, cname, instructor, room, time,

days)

Note that having the FD instructor room (and nothing else) in a single table allows the FD to be enforced (by enforcing the candidate key of instructor).

Instructor-Room (instructor, room)

Denormalization