CA218CourseNotes.doc

CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl

CA218CA218Introduction toIntroduction to

DatabasesDatabases

/tt/file_convert/55869bacd8b42a79668b461e/document.doc


Chapter 1. Information Systems................................................................................11a. Information Systems Introduction..............................................................................................................................11b. Information Systems & DBMS..................................................................................................................................2

Chapter 2. Database Overview...................................................................................32-1-1a. Database Components.........................................................................................................................................42-1-1b. DBMS Data.........................................................................................................................................................42-1-2. DBMS Hardware...................................................................................................................................................52-1-3. DBMS Software....................................................................................................................................................62-1-4a. DBMS Users I.....................................................................................................................................................72-1-4b. DBMS Users II....................................................................................................................................................72-2. What Data for a DBMS?..........................................................................................................................................82-3a. Models of Data........................................................................................................................................................92-3b. Data Model Differences........................................................................................................................................102-3c. DBMS Examples..................................................................................................................................................102-4a. Why use a DBMS ?..............................................................................................................................................112-4b. Specific Reasons for DBMS.................................................................................................................................112-4c. Why not a DBMS?................................................................................................................................................122-5. Three Level Architecture........................................................................................................................................122-5-1. The External Level..............................................................................................................................................132-5-2. The Conceptual Level.........................................................................................................................................132-5-3. The Internal Level...............................................................................................................................................132-6a. The Database Administrator I...............................................................................................................................142-6b. The Database Administrator II.............................................................................................................................14

Chapter 3. Storage Structures..................................................................................153-1. Why Storage Structures?........................................................................................................................................153-2a. Hardware Features of Disks..................................................................................................................................163-2b. Disk and File Managers........................................................................................................................................173-2c. Clustering on Disk Surfaces.................................................................................................................................173-3a. Using Index Files I................................................................................................................................................183-3b. Using Index Files II..............................................................................................................................................193-4a. Hashing I...............................................................................................................................................................193-4b. Hashing II.............................................................................................................................................................213-4c. Hashing III............................................................................................................................................................22

Chapter 4. Entity-Relationship Data Modeling......................................................234-1. E-R Introduction.....................................................................................................................................................234-2a. E-R Definitions I...................................................................................................................................................244-2b. E-R Definitions II.................................................................................................................................................244-2c. E-R Definitions III................................................................................................................................................254-3a. E-R Notation I.......................................................................................................................................................254-3b. E-R Notation II.....................................................................................................................................................274-3c. Cardinality Ratios.................................................................................................................................................284-3d. Recursive Relationships.......................................................................................................................................294-3e. Properties of Relationships...................................................................................................................................294-3f. Ternary Relationships...........................................................................................................................................304-3g. Additional Notation..............................................................................................................................................304-4. E-R Principles.........................................................................................................................................................314-5a. E-R Example I......................................................................................................................................................314-5b. E-R Example II.....................................................................................................................................................32

Chapter 5. Relational Model of Data.......................................................................335-1a. Basic Modelling....................................................................................................................................................345-1b. Relational Model Overview..................................................................................................................................355-2. Relational Tables....................................................................................................................................................365-2. Relational Tables....................................................................................................................................................375-3a. Relational Model Integrity Basics........................................................................................................................39



5-3b. Relational Model Integrity....................................................................................................................................405-4a. Relational Algebra Operators...............................................................................................................................425-4a1. The SELECT Operation......................................................................................................................................435-4a2. The PROJECT Operation...................................................................................................................................445-4a3. The PRODUCT Operation..................................................................................................................................465-4a4. The UNION Operation.......................................................................................................................................475-4a5. The INTERSECTION Operation........................................................................................................................485-4a6. The DIFFERENCE Operation............................................................................................................................495-4a7. The JOIN Operation............................................................................................................................................505-4a8. The DIVIDE Operation......................................................................................................................................515-4b. Relational Algebra................................................................................................................................................525-4c. Relational Expressions..........................................................................................................................................525-5. Relational Calculus.................................................................................................................................................54

Chapter 6. SQL..........................................................................................................556-1. SQL Background & Standards...............................................................................................................................556-2. SQL2 Schemas.......................................................................................................................................................576-3. SQL DDL...............................................................................................................................................................576-4. SQL SELECT Statement........................................................................................................................................646-5. SQL INSERT, DELETE and UPDATE.................................................................................................................746-6. Non-Standard SQL.................................................................................................................................................77

Chapter 8. The System Catalog................................................................................788-1. The System Catalog................................................................................................................................................788-2. The Informix Catalog.............................................................................................................................................808-3. The ORACLE7 Catalog..........................................................................................................................................82

Chapter 9. Views........................................................................................................839-1. View Definition......................................................................................................................................................839-2. View Examples.......................................................................................................................................................85

Chapter 10. Database Design & Normalisation......................................................9110-1. Introduction to Database Design..........................................................................................................................9110-2. 3NF, 2NF and 1NF...............................................................................................................................................9310-3. BCNF....................................................................................................................................................................9510-3-1. BCNF Example 1..............................................................................................................................................9710-3-2. BCNF Example 2..............................................................................................................................................9710-3-3. BCNF Example 3..............................................................................................................................................9810-3-4. BCNF Example 4..............................................................................................................................................9910-4. 4NF.....................................................................................................................................................................10010-5. 5NF.....................................................................................................................................................................10210-6. Database Design................................................................................................................................................103

Chapter 11. Databases and the Internet................................................................10411-1. Introduction........................................................................................................................................................10411-2. JDBC Introduction..............................................................................................................................................10611-3. JDBC Tutorial.....................................................................................................................................................10611-4. Databases and the Web - the Future...................................................................................................................114



Chapter 1. Information Systems

This introductory chapter describes the role that a Database Management System (DBMS) plays in terms of other information systems.

1a. Information Systems Introduction

1b. Information Systems and DBMS

Sources: Elmasri & Navathe pp 1-6

1a. Information Systems Introduction

" A computer-based 1 information system retrieves 2 information 3 from its database 4 in response to a users query 5 ".

1. Manual v computer based

2. Retrieve, store, modify, delete ... always 4 DML commands

3. Computerised information could be ...

structured numeric/alpha

free text

voice

image

rules

others ...

4. Database is a repository which is big and organised

5. User query:

Precise or vague information need

Expressed precisely or vaguely

Interactive or batch execution / retrieval

Seeking specific information or aggregate

/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 1


1b. Information Systems & DBMS

So where does a DBMS fit in:

Interactive or host query

Unambiguous statement

Precise query

Retrieved information is specifically stored or aggregated

Well-structured information, text and multimedia are stored as bit strings

Query is Boolean combination of predicates

Exact matching

Formal schema

DBMS also provides security, data independence, persistence, concurrency, recovery and backup



Chapter 2. Database Overview

This chapter presents an overview of databases, and is composed of six sections:

2-1-1. Database Components

2-1-2. DBMS Hardware

2-1-3. DBMS Software

2-1-4a. DBMS Users I

2-1-4b. DBMS Users II

2-2. What data for a DBMS ?

2-3a. Models of Data

2-3b. Data Model Differences

2-3c. DBMS Examples

2-4a. Why use a DBMS ?

2-4b. Specific Reasons for DBMS

2-4c. Why not a DBMS ?

2-5. Three Level Architecture

2-5-1. The External Level

2-5-2. The Conceptual Level

2-5-3. The Internal Level

2-6a. The Database Administrator I

2-6b. The Database Administrator II

Sources: Any database textbook overview / introduction.



2-1-1a. Database Components

What makes up a DBMS?

DBMS stores, maintains and provides access to data.

In this overview of DBMS components we look at:

Data

Hardware

Software

Users

2-1-1b. DBMS Data

Range of machine sizes from PC to mainframe, isolated or networked.

DBMS runs on entire range of platforms.

Single and multi-user, shared access, maintaining integrity of data.

Users concerned with overlapping subsets of total data meaning data perceived by different users in different ways.



Look at the total DCU data ... consists of ...

Students have views consisting of ...

Library has views consisting of ...

Finance has views consisting of ...

Inherent feature of DBMS data is that it is shared.

2-1-2. DBMS Hardware

Conventional machines vs. specialist database machines.

Mostly general purpose machines with DBMS as a conventional software application.

Accelerated chips have been proposed but not commercially successful.

Database machines do exist ... expensive, limited market.



2-1-3. DBMS Software

DBMS is an application program sitting between user and data.

DBMS handles all interactions between the two.

DBMS shields users from each other and from unauthorised access.



2-1-4a. DBMS Users I

Actors - Design, use and administer a DBMS

Database Administrators (DBAs)

DB Designers

End Users

Casual, occasional

Naive, canned transactions

Sophisticated

Stand-alone

System analysts and application programmers

Workers - Design, develop and operate DBMS software

DBMS designers and implementers

Tool developers

Operators and maintenance personnel

2-1-4b. DBMS Users II

Application programmers writing COBOL, PASCAL, C, PL/1, C++ programs with embedded DBMS commands, running online or batch and programs are precompiled usually which allows dynamic querying of DBMS at runtime.

End users using an interactive query language like SQL, possibly working in a bulletproof, controlled GUI environment (INFORMIX) or using a command line interface (ORACLE) ... same commands as APs.

Database Administrator (DBA) ... system manager for database application.



2-2. What Data for a DBMS?

" DBMS used by any reasonably self-contained organisation, commercial, scientific or technical, from a single individual to a large corporation, who want to manage a large volume of information ".

Dublin City University

Students

Lecturers

Courses

Books

Schools

Faculties

Lectures

All these are entities or distinguishable objects with properties in the real world.

We also have relationships between real world entities:

Schools make-up faculties

Schools have students

Schools have lecturers

Students attend lectures given-by lecturers

Lectures are-part-of courses

Students borrow books

Lecturers borrow books

Lecturers recommend books

Courses can-be-composed-of-other courses

Features of real world relationships are ...

Bi-directional relationships

Most are binary, some are ternary (beware of the connection trap here ... 3 x binary relationships does not equal one ternary)

Entity types may be linked in more than one way

Relationships are part of the data set

Relationship set is not exhaustive



2-3a. Models of Data

In order to turn a nebulous picture of data into something structured for a DBMS we use a data model to organise information.

A data model is many things including:

A set of guidelines for representing the logical organisation of data

A pattern according to which data and relationships can be organised

An underlying mathematical formalism for building logical data organisations

Data models define logical units or entity types and relationships between those units.

In modeling the real world a real world relationship is defined as a named, ordered list of entity types and relationships can be classified by how many entities from one type are associated with how many entities from another entity type.

1:1 is the simplest but rarest e.g. person has_spouse person

N:1 or 1:N is a many-to-one or functional e.g. student owns book

N:M is a many-to-many e.g. students are-lectured-by lecturers



2-3b. Data Model Differences

Data models differ in how they handle association relationships (between entities) but all handle attribute relationships (relationships describing a single entity) in the same way.

The main data models are:

Relational Model

Hierarchical Model

Network Model

Object Oriented Model

Extended Relational Model

Hierarchic and network are much older than relational ... early 1960's vs. 1972

Market trend towards relational and beyond ... few network / hierarchical left

Hierarchical and network defined by abstraction from implementations whereas relational was defined a priori and thus has a sound mathematical basis

Non-relational are record-at-a-time whereas relational is more abstract

Non-relational are programming systems with navigation and optimisation by end users ... relational systems do their own optimisation

Almost all non-relational systems have been extended to have relational front ends

2-3c. DBMS Examples

Relational DBMS products:

ORACLE, DB2, SQL/DS, INGRES, INFORMIX, Rdb/VMS, SYBASE

Hierarchical:

IMS

Network:

IDMS

Object-oriented:

ONTOS, GemStone, ObjectStore, O2



2-4a. Why use a DBMS ?

There are several reasons for using a DBMS that follow on from each other.

Different models of the same data different organisations

Relational model is popular because it is abstract and computing evolution has always been towards the more abstract.

2-4b. Specific Reasons for DBMS

Logical organisation gives a clear picture and helps programmers achieve faster development of application programs.

Handles low-level file maintenance.

Yields centralisation of information. This, in turn is a good thing as:

Redundancy is eliminated

Inconsistency is avoided

Data is shared

Standards are enforced

Security is applied

Integrity is maintained

Requirements are balanced

Yields data independence where data organisation is not built into application programs, for example

Representation of numeric data

Units for numeric data

Data coding

Stored record and stored file structure

DBA can change access structures during the mid-life of the DBMS without affecting DBMS users, except with respect to performance.



2-4c. Why not a DBMS?

High initial cost in hardware perhaps?

Expensive piece of software

Expensive in terms of personnel and training of users

Overhead of providing

Don't have a large volume of data

Concurrent users?

2-5. Three Level Architecture

Functional organisation.

Does not cover many DBMS functions like concurrency, backup, security etc.



2-5-1. The External Level

Users use a language incorporating a data sublanguage for the database consisting of:

Data definition language (DDL)

Data manipulation language (DML)

Individual user's view is an external view ... multiple occurrences of multiple types of external records

Views are defined by an external schema which is defined in DDL

2-5-2. The Conceptual Level

A representation of the entire information content of the database abstracted from physical store.

May be different or similar to external views.

Data as it is ... multiple occurrences of multiple types of conceptual records.

Conceptual schema is defined by conceptual DDL and includes security and integrity constraints not present in the external levels.

No more than a union of individual external schemas + security and integrity.

2-5-3. The Internal Level

Defines types of stored records, indices, how fields are represented, in what sequence, etc.

Defined using an internal DDL.

Programs accessing this layer are dangerous because they bypass security and integrity checks of the internal layer.

Mappings exist between the different levels of the 3LA and the DBA is responsible for correct mapping between the levels.



2-6a. The Database Administrator I

DBMS Components

Stored data manager

DDL compiler

Run-time database processor

Query compiler

Precompiler

DML Compiler

Recovery Manager?

Concurrency control manager?

An essential part of any DBMS is the role played by the DBA

Has overall control of the DBMS

Decides information content and logical and conceptual database design/schema

Decides on storage structures and access using DDL

Liaises with users and helps them design their external schemas using DDL

Defines security and integrity checks

Defines backup and recovery strategies

Monitors performance of the DBMS and responds to changing requirements by using load, dump and statistical analysis routines.

2-6b. The Database Administrator II

An important source of information for the DBA is the data dictionary or system catalog for the DBMS which is

System database

Contains data about data (meta-data)

Descriptions of other objects rather than "raw" data

Includes schemas and mappings

Data dictionary can be queried as if it was a database



Chapter 3. Storage Structures

Re-cap from previous courses, coverage of basic storage structures.

Internal level of 3LA.

3-1. Why Storage Structures ?

3-2a. Hardware features of disks

3-2b. Disk and file managers

3-2c. Clustering on disk surface

3-3a. Using index files I

3-3b. Using index files II

3-4a. Hashing I

3-4b. Hashing II

3-4c. Hashing III

Sources: Elmasri & Navathe chapters 3 & 4 or any database textbook.

3-1. Why Storage Structures?

Main memory has faster access than disk.

Disk technology has not changed much, though emergence of RAID may change this.

Databases store information on disk rather than main memory.

Task of DBMS is to minimise amount of information to retrieve from disk.

There are many storage structures, similar to existence of many sorting algorithms.

DBMS should support many storage structures and use the most appropriate.

This is at internal level of 3LA, users should not be aware of this.



3-2a. Hardware Features of Disks

Hardware of a disk drive ... do you know the following ...

Disk pack

Track

Platter

Surface

Block/Page

Interblock gap

Sectors

Buffers

Read/Write head

Seek time

Rotational delay/latency

Block transfer time

Bulk transfer rate

If not ... check it out!

Elmasri & Navathe pp 71 - 74



3-2b. Disk and File Managers

Page/block is unit of transfer between disk and memory.

Disk manager ... OS component managing free space on disk, performs garbage collection and de-fragmentation.

File manager associates file names with sets of blocks/pages ... may be part of OS, or of DBMS.

File manager of OS is not suited to DBMS application.

3-2c. Clustering on Disk Surfaces

Clustering: logically related records physically close together on disk surface.

DBA can vary clusterings in mid-life of database.

Knowledge of how data is to be used is essential to good physical database design.



3-3a. Using Index Files I

Regularly executed query:

" Find all student numbers with city = x "

DBMS organised to perform this well.

Two ways to execute query:

Binary search through index (age) file to find offset in data (student) file.

Sequential search through data (student) file.



3-3b. Using Index Files II

Create index on primary key or on other field(s) or on combination.

File can have any number of indexes.

Index on field combination not the same as two separate indexes.

Indexes (usually) speed up retrieval but slow down updates.

We count page I/O operations.

B-tree is usually best all-round index file but there are variations [E&N pp 116]

Multi-level indexes [E&N pp 113]

3-4a. Hashing I

Hashing: fast access based on given value.

Records physically placed at (disk) location, function of field value.

When storing a record, DBMS computes hash address & tells the file manager where to store the record.

When retrieving, DBMS performs some computation on query to find where the data is stored.

Hashing is only useful for searches that have one equality condition.



Hashing example:

Student number Mod 13 Location/"Bucket"

100 9

200 5

300 1

400 10

500 6



3-4b. Hashing II

Hash collisions:

Student number Mod 13 Location

400 10

700 11

1000 12

1200 4

1700 10

Range of values greater than number of locations collisions

Range of values approaches number of locations.



Hash collisions handled by pointer chain.

3-4c. Hashing III

Stored file can have any number of indexes but only one hash.

Works well for single equality predicate only.

Physical sequence on disk does not correspond to any logical organisation leading to high seek times and thrashing.

As file size increases, number of collisions increases.

Works in memory or on disk.

Extendible hashing, multiple hashing, dynamic hashing, linear hashing.



Chapter 4. Entity-Relationship Data Modeling

This chapter presents an overview of entity-relationship data modeling.

4-1. E-R Introduction

4-2a. E-R Definitions I

4-2b. E-R Definitions II

4-2c. E-R Definitions III

4-3a. E-R Notation I

4-3b. E-R Notation II

4-3c. Cardinality Ratios

4-3d. Recursive Relationships

4-3e. Properties of Relationships

4-3f. Ternary Relationships

4-3g. Additional Notation

4-4. E-R Principles

4-5a. E-R Example I

4-5b. E-R Example II

Sources: Elmasri & Navathe chapter 3 or any database textbook.

4-1. E-R Introduction

The E-R model is used to interpret, specify and document requirements for database processing systems, irrespective of the type of DBMS being used.

It is used to draw a formal picture but since its inception in 1976 it has gone through many variations, so there is no standard!



4-2a. E-R Definitions I

EntityAn instance of a physical object in the real world.

Entity ClassA group of objects of the same type.

Attributes (Properties)Entities have attributes or properties that describe their characteristics.

Composite AttributeAn attribute that is composed of several more basic attributes.

Simple AttributeAn attribute which is not divisible.

Single-Valued AttributeAn attribute that has a single value for a particular entity.

Multi-Valued AttributeAn attribute that has a set of values for the same entity.

Value SetEach simple attribute is associated with a value set (or domain) which specifies the set of values that may be assigned to that attribute for each individual entity.

4-2b. E-R Definitions II

Relationship ClassA relationship class (type) is a set of associations among entity types.

Relationship InstanceAn association of entities i.e. an instance of a relationship type.

Relationships may have properties (attributes).



4-2c. E-R Definitions III

Degree of a RelationshipThe degree of a relationship is the number of participating entities.

Recursive RelationshipA relationship between entities of the same class.

Cardinality Ratio of a RelationshipThis constraint specifies the number of relationship instances that an entity can participate in (e.g. 1:1, 1:N, N: M).

4-3a. E-R Notation I

Entity Types

Relationship Types

Attributes

Composite Attributes

Multi-valued Attributes

Key Attributes



4-3b. E-R Notation II



4-3c. Cardinality Ratios



4-3d. Recursive Relationships

4-3e. Properties of Relationships



4-3f. Ternary Relationships

4-3g. Additional Notation

Not part of core or lowest common denominator notation ...

Weak entities

ID-dependent entities

Sub- and supertypes

Derived attribute

Total participation

......



4-4. E-R Principles

E-R Principles - why?

Clarify ... structure from an unstructured world.

Several tools automate transformation of E-R diagram to DBMS schema.

IEW, IEF, Accelerator, Design/1, ORACLE CASE*Designer etc.

Start with natural language description and look for nouns (entities) and verbs (relationships).

Art, not science.

4-5a. E-R Example I

Football Club

"A football club has a name and a ground and is made up of players. A player can play for only one club and a manager, represented by his name manages a club. A footballer has a registration number, name and age. A club manager also buys players. Each club plays against each other club in the league and matches have a date, venue and score."



4-5b. E-R Example II

University Database

"A lecturer, identified by his or her number, name and room number, is responsible for organising a number of course modules. Each module has a unique code and also a name and each module can involve a number of lecturers who deliver part of it. A module is composed of a series of lectures and because of economic constraints and common sense, sometimes lectures on a given topic can be part of more than one module. A lecture has a time, room and date and is delivered by a lecturer and a lecturer may deliver more than one lecture. Students, identified by number and name, can attend lectures and a student must be registered for a number of modules. We also store the date on which the student first registered for that module. Finally, a lecturer acts as a tutor for a number of students and each student has only one tutor."



Chapter 5. Relational Model of Data

Crucial part of DBMS is the way the real world is modelled ... the style or feel.

Relational model is (still) the most significant model & most DBMS implementations are relational.

This chapter presents the relational model.

5-1a. Basic Modelling

5-1b. Relational Model Overview

5-2. Relational Tables

5-3a. Relational Model Integrity Basics

5-3b. Relational Model Integrity

5-4a. Relational Algebra Operators

5-4a1. SELECTION operation

5-4a2. PROJECTION operation

5-4a3. PRODUCT operation

5-4a4. UNION operation

5-4a5. INTERSECTION operation

5-4a6. DIFFERENCE operation

5-4a7. JOIN operation

5-4a8. DIVIDE operation

5-4b. Relational Algebra

5-4c. Relational Expressions

5-5. Relational Calculus



5-1a. Basic Modelling

Any data model has:

Form of data representation ... tables or relations

Rules specifying allowable states of data ... integrity conditions

Operators to manipulate data

Comparison to other real-world models:

Integers

Molecular model of solids and liquids

EMR ... wave or particle ?

Like (all ?) models, relational model is a paper model.

Why do we model ?

To understand is the usual motivation but also to reduce or abstract or encapsulate the real world into something manageable. This allows us to form predictions.

"Manageable" can mean to make computable, or not.

Example models: weather, traffic flow, stock market etc. For each it is clear why we want to model & what we do with these models.

Models are not necessarily an exact mapping of the real world, especially if the world is complex.

For databases, the operations are high-level & clear so the relational model can map the real world exactly, at the level we want.

Moving the model to a computer system, not all R.DBMS fully implement the relational model.

For other models, they do not always exactly map the real world ... stock market !



5-1b. Relational Model Overview

Relational database perceived by users as a collection of tables & nothing else.

Three tables named S, P and SP. Corresponding to suppliers, parts and shipments of parts by suppliers.

Also written as:

S(S#, SNAME, STATUS, CITY)

P(P#, PNAME, COLOUR, WEIGHT, CITY)

SP(S#, P#, QTY)

This is a model of a very limited world.

The entire world can be described.

Other related parts of the real world could be included e.g.

MAKES(P#, M#, COST)

M(M#, MNAME, MADDR)

Entire information content of database is represented as data values with no links or pointers or offsets between tables.

All data values are atomic, exactly one value and never a set.




Relation is mathematical term for table.

In the database from earlier, there are three such tables named S, P and SP.

A tuple is a row, an attribute is a column.

In table S there are three tuples and four attributes and we can refer to S, S#, SNAME etc.

Lets work with a more abstract definition of a table for a while.




A domain is a pool of values of the same (data) type from which one or more attributes in one or more tables take their values.

Above we can see that attribute A1 draws values from domain D1, A2 from D2 etc.

If two attributes draw values from the same domain then comparisons between tuples can be made on these attributes.

A relation, R, on domain D1 to DN consists of

A set of attributes A1 to AN such that Ai corresponds to Di.

A set of N-tuples or entries in the relation.

N is the degree and the number of tuples is the cardinality.

A domain normally draws its values from a data type, analogous to a programming language data type.

In addition, a domain may or may not include the additional value, NULL, as decided at domain definition time.

If included in a domain, the value NULL does not correspond to 0, " " or infinity .... it corresponds to

Not known

Missing

Does not apply

Thus a domain of 16 bit integers has a set of 216 = 65536 + 1 unique values, meaning it requires more than 16 bits to store !



During a database lifetime cardinality changes while degree does not.

A domain may appear more than once in a given relation.

Domains may be simple or composite.

Simple : name, age etc.

Composite : date ... number, street, city, zip ... etc.

A relation is a set; no duplicate tuples implying:

Tuples are unordered.

Attributes are unordered and are referenced by name, not position.

There is always at least one way to uniquely address tuples i.e. the combination of all attributes.

Attribute names are unique only within a table and may be re-used in different tables.

Table names are unique.

A primary key is a column or combination of columns with no duplicates or combination of duplicates, never (i.e. not allowable given semantics of table).

This never is important and implies cannot determine PK (primary key) from data.

Besides the default PK (all attributes) there is normally a "smaller" PK.

All attribute values are atomic ... one value at row/column intersection, called normalised or first normal form.

Relational DBMS is a database where data is represented as a collection of time-varying normalised relations of assorted degrees and cardinalities.

Working through the example above ...

Tables

Attributes and tuples

Domains

Primary keys

Degrees and cardinalities.



5-3a. Relational Model Integrity Basics

Integrity is the property of a database state being consistent with some predefined set of rules.

Feeling of correctness with respect to:

Domain values (independent).

Dependencies across tables.

One: Suppose we replace 'S3' in the SP table for P2 shipments with the value 'S4'.

Our shared understanding and interpretation tells us this is incorrect, cannot be, has lost its integrity.



Two: Furthermore, suppose there is a real-life rule that no two suppliers can come from the same city ... current data state is a violation.

The relational model has built-in support for supporting the first kind of rule above, but not the second.

5-3b. Relational Model Integrity

Some definitions:

Candidate KeyAn attribute or combination of attributes which is a unique identifier within a table.

Primary KeyOne of the candidate keys.

Alternate KeyThe candidate key(s) (if any) not chosen as the primary key.

Foreign KeyA (combination of) attribute(s) in one relation whose value(s) are required to equal in the primary key of another relation.

A foreign key is not necessarily part of the primary key.



Entity IntegrityNo attribute forming part of the primary key of a base table is allowed to have NULL values.

Referential IntegrityIf a relation R2 includes a foreign key FK matching the primary key PK of some base relation R1, then every value of R2.FK must:

(a) be equal to a value of R1.PK

or

(b) be wholly NULL, i.e. each attribute in R2.FK must be null.

N.B. Cannot legally refer to R1.PK, R2.FK

A base relationship corresponds to a real world entity or relationship, not a view.

The two rules refer to database states, not to transactions.

Other semantic rules, like "no two suppliers from the same city", are not in the model.

Most R.DBMS products support stopping updates that would violate these two rules.



5-4a. Relational Algebra Operators

Relational model data manipulation consists of

An assignment operator to "remember" the evaluation of expressions

A set of operators called the relational algebra

A set of alternative operators called the relational or tuple calculus.

Relational algebra has eight operators:

5-4a1. SELECTION operation

5-4a2. PROJECTION operation

5-4a3. PRODUCT operation

5-4a4. UNION operation

5-4a5. INTERSECTION operation

5-4a6. DIFFERENCE operation

5-4a7. JOIN operation

5-4a8. DIVIDE operation

Examine each of them in turn, re-examining them subsequently if necessary, until you get a good grasp of each of them, before proceeding with the course.



5-4a1. The SELECT Operation

Used to select a subset of tuples from a single relation which satisfy a selection condition.

Diagrammatically:

Written as <selection condition> ( <relation> ) where <selection condition> is a boolean expression and <relation> is a single relation.

Example:

<S.S# = S1(S)

<selection condition> compares an attribute name with a constant or another attribute name (if drawn from same domain), using {=, <, <=, >, >=, ¬ =} as comparators, and using boolean connectives if necessary.

SELECT is unary, commutative and applies to each tuple independently.

A series of nested SELECTs is equivalent to a nested SELECT with co-joined selection conditions:

<SP.P# = P1 ( < SP.QTY >= 100 ( <SP.S# ¬ = P1 (SP))) = <(SP.P# = P1) AND (SP>QTY >= 100) AND (SP.S# ¬ = P1) (SP)



5-4a2. The PROJECT Operation

Used to select a subset of columns from a single relation.

Diagrammatically:

Written as <attribute list> ( <relation name> ) where <attribute list> is a list of attributes in the specified relation and <relation name> is a single relation or an algebraic expression evaluating to a single relation.

Example:

<SNAME, STATUS(S)



If <attribute list> does not include the primary key then duplicates are possible and are removed.

Thus:

<CITY (S) evaluates to ...



5-4a3. The PRODUCT Operation

One of the standard set theoretic binary operations, the CARTESIAN PRODUCT or CROSS PRODUCT combines tuples from one relation with tuples from another relation, in all possible combinations of ways.

Written as R S

Thus if R has degree n and cardinality m and S has degree k and cardinality l, then R S has degree (n + k) and cardinality (m * 1).

Diagrammatically:



5-4a4. The UNION Operation

One of the standard set theoretic binary operations operating on union compatible relations (same degrees and domain matching).

Denoted R1 R2, the result of this is a relation that includes all tuples in either of R1 or R2 or both.

Duplicate tuples are eliminated.

Union is commutative and associative.

Diagrammatically:



5-4a5. The INTERSECTION Operation


Denoted R1 R2, the result of this is a relation that includes all tuples in both R1 and R2.

Duplicate tuples are eliminated.

Intersection is commutative and associative.

Diagrammatically:



5-4a6. The DIFFERENCE Operation


Denoted R1 - R2, the result of this is a relation that includes all tuples in R1 but not in R2.

Duplicate tuples are not an issue.

Difference is not commutative:

(R - S) ¬ = (S - R)

Diagrammatically:



5-4a7. The JOIN Operation

The JOIN operation is a binary operation which is used to combine tuples from two relations where the tuples are related by virtue of conforming to some join expression.

Diagrammatically:

The JOIN operation is written as:

R (join condition) S

If R(A1, A2, ... An) and S(B1, B2, ... Bm) then R (join condition) S is a relation with n + m attributes, namely (A1, ... An, B1, ... Bm) in that order.

Tuples in the resulting relation are those which are combinations of a tuple in R and a tuple in S which satisfy the join condition.

JOIN vs. CARTESIAN PRODUCT ?

The <join condition> compares an attribute from one relation with another attribute from the other relation provided they are drawn from the same domain, using {=, <, <=, >, >=, ¬ =} as comparators, and possibly augmented using boolean connectives if necessary to link more than one such expression.

The most common JOIN uses equality comparators only and is called an equijoin where the result will contain two identical columns. If we remove one of these identical columns we are left with a natural join.

LHS (LHS.Attrib2 = MID.Attrib1) MID



5-4a8. The DIVIDE Operation

The DIVIDE or DIVISION operator is another binary operation which can be applied to two relations R and S in the operation R ÷ S but only where the set of attributes in S is a subset of the set of attributes in R.

Formally, R(Z) ÷ S(X) where X Z, yields a relation T(Y) where Y (the set of attributes in the resulting T) is Z - X.

For a tuple to appear in the result of a divison operation, the values in that tuple must appear in R in combination with every tuple in S.

Thus the divisor (S) should be small, both in degree and cardinality, to avoid the empty resulting relation.

Diagrammatically:



5-4b. Relational Algebra

Of the eight relational operators, the group comprising

{SELECT, PROJECT, PRODUCT, UNION, DIFFERENCE}

or

{ , , , , -}

are primitive operators in that the other three can be defined in terms of these five.

Thus:

R S = (R S) - ((R - S) (S - R))

R <condition> S = <condition> (R S)

R ÷ S = Y (R) - Y ((S ,Y (R)) - R)

Why the relational algebra ?

Relational expressions can be constructed as a high level symbolic representation which can be subjected to transformation rules, hence optimisation.

Analogy to integer arithmetic here ...

relations integers

algebra +, -, *, ÷

+, - are primitive operators.

Transformation rule:

"... if an expression is the repeated addition of the same number, x, n times, then this is equivalent to multiplying x by (n + 1)."

5-4c. Relational Expressions

Some example relational expressions.

Not as complex as Elmasri & Navathe pp170 !

Ignore syntax and assignment operator.



One: Retrieve names and status of suppliers who supply 300 cases of any part

SNAME, STATUS (S S.S# = SP.S# ( QTY = 300 (SP))

And the answer is ...

Two: Retrieve the colour of parts supplied by either S1 or S2

COLOUR (P P.P# = SP.P# ( (SP.S# = S1 OR SP.S# = S2) (SP))

... and the answer is ...

Three: Retrieve the name and city of suppliers who supply any kind of part which is either green or made in Paris

SNAME, S.CITY (S S.S# = SP.S# (( (COLOUR = GREEN OR CITY = PARIS) (P)) P.P# = SP.P# SP))

The answer is ...



5-5. Relational Calculus

Calculus is alternative to algebra, specified at the same time but for historical reasons not as important as the algebra.

Calculus vs. algebra ?

Calculus is declarative, one expression specifying retrieval whereas in the algebra we write a formula which is a nested sequence of operations implying an ordering of those operations implying a procedure for evaluating it.

Not so with the calculus where we specify what to retrieve, not how to retrieve it !

Relational calculus and algebra are identical.



Chapter 6. SQL

This chapter covers SQL.

6-1. SQL Background & Standards

6-2. SQL2 Schemas

6-3. SQL DDL

6-4. SQL SELECT Statement

6-5. SQL INSERT, DELETE and UPDATE

6-6. Non-standard SQL

Sources: Elmasri & Navathe Chapter 7 or any database book.

6-1. SQL Background & Standards

SQL pronounced SEQUEL but named SQL for legal reasons.

Not case-sensitive and may be formatted any way ... convention is to put keywords in uppercase and new clauses on new lines.

SQL defined in 1974 by the IBM group developing SYSTEM R.

Most R.DBMS implementations have an SQL-like interface.

There are several standards for SQL ... coming together at last !

Goal is to have database vendors conform to interface standards allowing DB applications to operate with multiple products increased competition.

In general in computing, several governments insist on conformance to standards.

The SQL Standards ...

Standardisation effort started in mid-1980s.

SQL-86 is the bare bones standard, defined as the union of common features of most important DBMS.

SQL-89, a superset of SQL-86, added features like default values, check constraints and simple referential integrity.

NIST publishes guidelines such as FIPS 127 test suite for SQL-86 and FIPS 127-1 for SQL89.

Some 200 test cases and passing these puts a product on the validated products list.

SQL-92 aka SQL2 has FIPS 127-2, and is another superset and significantly larger (c. 575 vs. c. 120 pages).

SQL-86 and SQL-89 were just catching up and unifying what was in place ...



SQL2 has features found in existing products (at that time) but also features not in any products, so it is ahead of its time.

SQL2 has SQL89 plus ...

Additional data types like var length chars, bit strings, date and time intervals, etc.

Outer joins

Catalog specifications

Domains

Assertions

Temporary tables

Referential actions

Schema management language

Dynamic SQL

Scrolled cursors

Connections

Information schema tables

SQL3 specification was scheduled for c. 1996 with major extensions on SQL2 in several dimensions like type systems, stored procedures and OO ... but it is delayed.

For non-relational DBMS ... hierarchical and network models have no standard because there are so few systems.

OODBMS situation reminiscent of R.DBMS but OO developers have proposals for OSQL, an attempted migration path from R.DBMS to OODBMS.

SQL standardisation is great ... but too late.

Because conformance to SQL standardisation became in vogue only recently, most vendors have developed extra features, and all are different.

Vendors claim "We support SQL2 ... plus we have all these extra features ...".

Users buy-in and find they soon depend on the non-standard features, so they are hardwired to a particular product ... trapped !



For the purpose of presenting SQL we use SQL2, but not all of SQL2 as it is so enormous and most will be unused anyway.

People will slowly evolve towards SQL2, gradually embracing its features.

At the end we look at some ORACLE-specific enhancements, to give a flavour.

When you start to use SQL, the SQL2 essentials here will be enhanced by 'local' SQL features available from your DBMS.

6-2. SQL2 Schemas

DBMS products normally partition non-overlapping applications in some ad hoc way ... ORACLE uses tablespaces.

This notation is formalised in SQL2 as SCHEMAS.

Gather together tables, views, domains, grants, assertions, indexes and other constructs that belong to the same database application.

Each schema is given a schema name.

CREATE SCHEMA SCHEMANAMEAUTHORISATION USERNAME

We will ignore schema issues.

6-3. SQL DDL

SQL DDL (Data Definition Language) is:

CREATE TABLE

ALTER TABLE

DROP TABLE

CREATE INDEX

DROP INDEX

The syntax is ... where [] are options and {} are repetitions.

CREATE TABLE tablename (colname coltype [ attrib_constraint ]{, colname coltype [ attrib_constraint ] }[ table_constraint {, tableconstraint } ] )

Essentially CREATE TABLE X followed by a series of at least one colname / coltype clauses and then any number of table constraints.

CREATE TABLE tablename (colname coltype [ attrib_constraint ]



{, colname coltype [ attrib_constraint ] }[ table_constraint {, tableconstraint } ] )

Data types available include:

Integer numeric (INTEGER, SMALLINT)

Real numbers (FLOAT, REAL, DOUBLE PRECISION)

Formatted numbers

Character strings, fixed or varying length

Bit-string (fixed or varying)

Date

Time


CREATE TABLE S (S# char(5),sname char(20));


CREATE TABLE S (S# char(5),sname char(20));




CREATE TABLE S (S# char(5) NOT NULL,sname char(20),dno integer,PRIMARY KEY (S#),FOREIGN KEY (dno) REFERENCES departments(dno));





CREATE TABLE tablename (colname coltype [ attrib_constraint ]



{, colname coltype [ attrib_constraint ] }[ table_constraint {, tableconstraint } ] )

CREATE TABLE S (S# char(5) NOT NULL DEFAULT 0,sname char(20) NOT NULL,dno integer,PRIMARY KEY (S#),CONSTRAINT deptcons, FOREIGN KEY (dno) REFERENCES departments(dno),ON DELETE set null,ON UPDATE cascade);

Referential triggered actions ...


CREATE TABLE S (S# char(5) NOT NULL DEFAULT 0,sname char(20) NOT NULL,dno integer,PRIMARY KEY (S#),CONSTRAINT deptcons, FOREIGN KEY (dno) REFERENCES departments(dno),ON DELETE set null,ON UPDATE cascade);




Check constraints ...

CHECK (conditional expression)

CREATE TABLE S (S# char(5) NOT NULL DEFAULT 0,sname char(20) NOT NULL,dno integer,PRIMARY KEY (S#),CONSTRAINT deptcons,FOREIGN KEY (dno) REFERENCES departments(dno),ON DELETE set null,ON UPDATE cascade,CONSTRAINT svalue,CHECK (S# > 0 and S# < 100));


Check constraints ...

CHECK (conditional expression)

CREATE TABLE S (S# char(5) NOT NULL DEFAULT 0,sname char(20) NOT NULL,dno integer,PRIMARY KEY (S#),CONSTRAINT deptcons,FOREIGN KEY (dno) REFERENCES departments(dno),ON DELETE set null,ON UPDATE cascade,CONSTRAINT svalue,CHECK (S# > 0 and S# < 100));



CREATE DOMAIN domain-name data-type[ DEFAULT definition][ domain-constraint-definition-list];

Data-type is one of the built-in scalar data types.

DEFAULT definition is a default value.

Domain constraints to apply to every column using the domain.

CREATE DOMAIN dno-type AS INTEGERDEFAULT 99CONSTRAINT dno-defn-constraintCHECK (VALUE IN(90, 92, 93, 95, 97, 99) )NOT NULL;

CREATE DOMAIN dname-type AS CHAR(20);CREATE DOMAIN dsales-type AS NUMERIC(10, 2);

CREATE TABLE departments (dno dno-type,dname dname-type,dsales dsales-type);

CREATE DOMAIN domain-name data-type[ DEFAULT definition][ domain-constraint-definition-list];

CREATE DOMAIN dno-type AS INTEGERDEFAULT 99CONSTRAINT dno-defn-constraintCHECK (VALUE IN(90, 92, 93, 95, 97, 99) )NOT NULL;

CREATE DOMAIN dname-type AS CHAR(20);CREATE DOMAIN dsales-type AS NUMERIC(10, 2);

CREATE TABLE departments (dno dno-type,dname dname-type,dsales dsales-type);



SQL2 DOMAINS ...

Syntactic shorthand

No requirement that they be used ... can use system-defined data types

No support for domains on domains

No strong typing or type checking, requirement is only for underlying data types to be the same

No user-defined operations on domains

No subtypes, supertypes or inheritance

No domain of truth values

ALTER TABLE tablename ADD colname coltype;

Add one new (rightmost) column to a table

Define a new default value for an existing column

Delete an existing column's default value

Drop an existing column

Specify a new base table integrity constraint

Delete an existing base table integrity constraint

DROP TABLE tablename;

DROP TABLE s;DROP TABLE departments;



6-4. SQL SELECT Statement

SQL DML (Data Manipulation Language) has four commands:

SELECT

INSERT

UPDATE

DELETE

Basic format of SELECT statement:

SELECT attributesFROM table(s)WHERE conditionGROUP BY attribute(s)HAVING conditionORDER BY attribute(s);

Format of presentation is to look at clauses / features individually, using the sample suppliers, parts, shipments database as a worked example.

Not all of our SQL SELECTs have meaningful answers in this database.



Basic SQL SELECT.

Get colour and city for non-Paris parts with weight greater than 10

SELECT P.COLOUR, P.CITYFROM PWHERE P.CITY <> 'PARIS'AND P.WEIGHT > 10;

Answer:

Red, LondonBlue, RomeRed, London

SELECT removing duplicates.

Get unique colour and city for non-Paris parts with weight greater than 10

SELECT UNIQUE P.COLOUR, P.CITYFROM PWHERE P.CITY <> 'PARIS'AND P.WEIGHT > 10;

Answer:

Red, LondonBlue, Rome

Sorting the output.

Get unique colour and city for non-Paris parts with weight greater than 10, order by colour

SELECT UNIQUE COLOUR, CITYFROM PWHERE P.CITY <> 'PARIS'AND P.WEIGHT > 10ORDER BY COLOUR;

Answer:

Blue, RomeRed, London



SELECTs containing JOINs.

Get the name of suppliers and the name of the parts they supply

SELECT SNAME, PNAMEFROM S, SP, PWHERE S.S# = SP.S#AND SP.P# = P.P#;

Answer:

Smith, NutSmith, BoltSmith, ScrewJones, NutJones, BoltBlake, Bolt

Using aliases to resolve ambiguities.

SELECT SNAME, PNAMEFROM S, SP, PWHERE S.S# = SP.S#AND SP.P# = P.P#;

SELECT SUPPLIER.SNAME, PART.PNAMEFROM S SUPPLIER, SP SHIPMENT, P PARTWHERE SUPPLIER.S# = SHIPMENT.S#AND SHIPMENT.P# = PART.P#;

Specifying JOIN in the FROM clause, to make it easier to comprehend.

Get pairs of city names such that a supplier in the first city supplies a part stored in the second city

SELECT DISTINCT S.CITY, P.CITYFROM S JOIN SP USING S# JOIN P USING P#;

Answer:

Paris, LondonParis, ParisParis, RomeRome, Paris



Use of '*'.

SELECT *FROM SWHERE STATUS <> 10;

Answer:

S1, Smith, 20, ParisS3, Blake, 30, Rome

Set Operators ... UNION, etc.

Get supplier and part numbers where shipments are greater than 300 or where the supplier who supplies a part has a status not equal to 10

(SELECT SP.S#, SP.P#FROM SPWHERE QTY > 300)UNION(SELECT SP.S#, SP.P#FROM SP, PWHERE S.S# = SP.S#AND S.STATUS <> 10);

Answer:

S1, P3S2, P2 S1, P1S1, P2 S1, P3S3, P2

Nested queries ... complete SELECTs within WHERE clauses of another query.

Get supplier names for suppliers who supply part P2

SELECT DISTINCT S.SNAMEFROM SWHERE S.S# IN(SELECT SP.S#FROM SPWHERE SP.P# = 'P2' );

Answer:

SmithJones Blake



Nested queries ... alternative way to formulate a query.



SELECT DISTINCT S.SNAMEFROM S, SPWHERE SP.P# = 'P2'AND S.S# = SP.S#;

Nested queries ... relational algebra.



SELECT DISTINCT S.SNAMEFROM S, SPWHERE SP.P# = 'P2'AND S.S# = SP.S#;

((SP JOIN S) where P# = 'P2') [SNAME]

Nested query ... two levels.

Get supplier names for suppliers who supply at least one red part

SELECT DISTINCT S.SNAMEFROM SWHERE S.S# IN

(SELECT SP.S#FROM SPWHERE SP.P# IN

(SELECT P.P#FROM PWHERE P.COLOUR = 'Red'));

Answer:

SmithJones

Nested query ... re-phrased as JOINs and in relational algebra.



Get supplier names for suppliers who supply at least one red part

SELECT DISTINCT S.SNAMEFROM SWHERE S.S# IN

(SELECT SP.S#FROM SPWHERE SP.P# IN

(SELECT P.P#FROM PWHERE P.COLOUR = 'Red'));

(((P where colour = 'Red') JOIN SP) [S#] JOIN S) [SNAME]

Nested query ... explicit sets.

SELECT UNIQUE S.SNAMEFROM SWHERE S# IN (S1, S2);

Answer:

SmithJones

Nested queries ... using EXISTS

EXISTS ( SELECT ... FROM ... )

evaluates TRUE iff the embedded SELECT is not empty.


SELECT UNIQUE S.SNAMEFROM SWHERE EXISTS (SELECT *FROM SPWHERE SP.S# = S.S#AND SP.P# = 'P2' );



Nested queries ... using NOT EXISTS

NOT EXISTS ( SELECT ... FROM ... )

evaluates TRUE iff the embedded SELECT is empty.

Get supplier names for suppliers who do not supply part P2

SELECT UNIQUE S.SNAMEFROM SWHERE NOT EXISTS (SELECT *FROM SPWHERE SP.S# = S.S#AND SP.P# = 'P2' );

Ditto for NOT IN.

Nested queries ...

In addition to IN, SQL also has operators to compare a single value to a set of values.

= ANY= SOME

These return TRUE if the single value equals some or any value in the set of values.

Can also use >, >=, <= and <> with ANY or SOME.

The keyword ALL can also be used in nesting queries and its meaning is that it returns TRUE only if ALL values in the comparison operation are true.

Aggregate functions ... within SQL there are built-in functions COUNT, SUM, MAX, MIN and AVG.

May be used in SELECT or HAVING clauses.

We may include DISTINCT or UNIQUE to remove duplicates before applying the operation (excluding MAX and MIN).

COUNT(*) counts all rows without eliminating duplicates.

NULL values are discarded before applying the operators, except for COUNT(*).

If the argument is an empty set, COUNT returns a value of 0, the others return NULL.

Get Max and Min quantity of shipments for part P2

SELECT MAX(SP.QTY), MIN(SP.QTY)FROM SPWHERE SP.P2 = 'P2';

Answer:

400, 200



Sometimes we want to apply aggregate functions to subgroups of tuples ... e.g. avg salary of employees in each department, or number of shipments of each part.

We can group tuples based on having the same value for some attributes (or combination of ) and apply functions to each of the groups.

Done via GROUP BY clause which specifies attributes which should also be in the SELECT clause.

For each part supplied, get the part number and total shipment quantity

SELECT P#, SUM(QTY)FROM SPGROUP BY P#;

Answer:

P1, 600P2, 800P3, 400

For each part supplied, get the part number and total shipment quantity

What is done here is that the table (after WHERE clause is evaluated) is re-arranged into GROUPS which share the same value of P#.

So

S1 P1 300S1 P2 200S1 P3 400S2 P1 300S2 P2 400S3 P2 200

is turned into:

S1 P1 300 [P1 values]S2 P1 300

S1 P2 200 [P2 values]S2 P2 400S3 P2 200

S1 P3 400 [P3 values]

Thus:

SELECT P#, SUM(QTY)FROM SPGROUP BY P#;

Answer:

P1, 600P2, 800P3, 400



To apply restrictions so that some groups are eliminated we use the HAVING clause to eliminate groups (not original tuples).

For each part supplied, get the part number and total shipment quantity provided there is more than one shipment of that part

SELECT P#, SUM(QTY)FROM SPGROUP BY P#HAVING COUNT(*) > 1;

S1 P1 300 [P1 values]S2 P1 300

S1 P2 200 [P2 values]S2 P2 400S3 P2 200


Answer:

P1, 600P2, 800

We can have conditionals (WHERE) applied to eliminate tuples before the GROUP BY and HAVING clauses.

For each part supplied which is not green, get the part number and total shipment quantity provided there is more than one shipment of that part

SELECT SP.P#, SUM(QTY)FROM SP, PWHERE P.P# = SP.P#AND P.COLOUR <> 'Green'GROUP BY SP.P#HAVING COUNT(*) > 1;

S1 P1 300 [P1 values]S2 P1 300


Answer:

P1, 600



String and substring comparisons in SQL use the LIKE operator where % replaces an arbitrary number of characters and _ replaces a single arbitrary character.

SELECT SNAMEFROM SWHERE SNAME LIKE '%e%';

Answer:

JonesBlake

SQL can include simple arithmetic operators in the SELECT clause of SELECT statements.

SELECT S#, QTY*0.90FROM SPWHERE P# = 'P1';

Answer:

S1, 270S2, 270

SELECT statements ... general.

SQL SELECT can have ...

SELECT attributesFROM table(s)WHERE conditionGROUP BY attribute(s)HAVING conditionORDER BY attribute(s);

but only SELECT and FROM are mandatory.

A query is evaluated by applying the FROM, then WHERE, then GROUP BY, then HAVING , then ORDER BY.

SQL is extremely redundant in that for most queries, even simple ones, there is usually more than one way to formulate them, but they are all correct and efficiency is not a concern as the DBMS does the query optimisation.



6-5. SQL INSERT, DELETE and UPDATE

INSERT

Insert a single tuple into a single relation.

INSERT INTO SVALUES (S4, "Ryan", 10, "London");

(S4, Ryan, 10, London)

INSERT INTO S (S#, SNAME)VALUES (S4, "Ryan");

(S4, Ryan, NULL, NULL)

INCLUDEPICTURE "Images\\tables-1.gif" \* MERGEFORMAT

INSERT

Insert multiple tuples into a relation as the result of a query.

INSERT INTO S (S#, CITY)SELECT SP.S#, P.CITYFROM SP, PWHERE SP.P# = P.P#AND QTY = 300;

(S1, NULL, NULL, London)(S2, NULL, NULL, London)



DELETE

Delete a single tuple from a single relation:

DELETE FROM SWHERE S# = S1;

Delete a set of tuples from a single relation:

DELETE FROM SWHERE CITY = 'Paris';

Delete all tuples in a relation:

DELETE FROM S;

Delete as the result of a sub-query:

DELETE FROM SWHERE S# IN(SELECT S#FROM SPWHERE QTY > 200);

Note: Error here. Dialogue should state that all tuples exceptthose with QTY = 200 are deleted from S.

Delete nothing:

DELETE FROM SWHERE S# = S4;

UPDATE

Modify the attribute values of some tuple:

UPDATE SSET SNAME = "Murphy",STATUS = 15WHERE S# = S1;

Thus:

(S1, Smith, 20, Paris) (S1, Murphy, 15, Paris)



UPDATE

Modify the attribute values of some tuples:

UPDATE SSET SNAME = "Murphy",STATUS = 15WHERE CITY = 'Paris';

Thus:

(S1, Smith, 20, Paris) (S1, Murphy, 15, Paris)(S2, Jones, 10, Paris) (S2, Murphy, 15, Paris)

UPDATE

Modify nothing !

UPDATE SSET SNAME = "Murphy",STATUS = 15WHERE CITY = 'Dublin';



6-6. Non-Standard SQL

Almost all relational DBMS with SQL interfaces provide "extensions" to the standard and don't (yet) implement the full standard SQL2.

As an example, ORACLE 7 has the following extra features:

Each table has pseudo-columns which can be queried but whose values cannot be changed and they include:

ROWID (uniquely identify a row)

ROWNUM (the position of a single row among others selected by a query)

Data types include DATE, LONG (char string up to 2 Gbytes), LONG RAW (binary string up to 2 Gbytes) and RAW (binary up to 2K)

CREATE CLUSTER creates a clustering of database tuples on disk

ALTER CLUSTER to refine storage allocations for a cluster by increasing its disk space, filenames etc.

ALTER TABLESPACE ... by adding or renaming a database file or refining storage limits

ANALYSE ... validates the structure of an index, table or cluster or collects performance statistics for them (percentage distributions etc.)

CREATE CLUSTER ... creates new cluster and specifies the columns which are to be its key, assign disk space, etc.

CREATE CLUSTERED INDEX

CREATE PROFILE ... for a user ... limit resources in terms of CPU usage, number of transactions, connect time, idle time, ...

CREATE SEQUENCE ... creates a new sequence suitable for generation of primary keys ... start with, increment by, max val, min val, ordering ... is this domain definition ?

CREATE TABLE with clusters

As part of table or column integrity constraints, can specify a tablename into which are put rows violating the constraint, and for each store:

(rowid, owner, tabname, constraint)

CREATE TRIGGER

EXPLAIN PLAN to describe each step of the execution plan for an SQL statement and place this description in a PLAN table whose attributes include statement ID, timestamp, operations, etc.

And many others ...



Chapter 8. The System Catalog

This chapter covers the system catalog.

8-1. The System Catalog

8-2. The Informix Catalog

8-3. The ORACLE7 Catalog

Sources: Elmasri & Navathe Chapter 15 has an ER design and a mock-up of a catalog for a relational and for a network DBMS catalog.

E&N also wrestles with the issue of whether a system catalog is a data dictionary … angels on the head of a pin !

8-1. The System Catalog

The system catalog is a part of a (relational) DBMS containing:

Table names

Attribute names and data types

Index names and existence

Table and user level authorisations

View definitions and dependencies

Primary and secondary and foreign keys

Synonym names

All kind of constraints, database and table levels

Users, authorisation, names, passwords

Anything about the database or describing the format of the database i.e. "meta data".

In R.DBMS implementations, all this information is implemented itself as a set of database tables which users can see and can query (if you know the format of the catalog).

The catalog tables contain entries for all users tables and also contain entries for themselves !

Different R.DBMS implementations have different implementation approaches for the catalog although SQL3 is attempting some standardisation of catalog formats (implemented ironically via views rather than catalog re-design).

As the catalog is a set of tables the user can see, these tables can be queried directly by endusers (SQL SELECT), but INSERT, DELETE and UPDATE commands are not allowed as they potentially corrupt the database ... but the most frequent accesses are



by the DBMS modules themselves ... the query optimiser needs to know the names etc. of tables and attributes, and also needs to know the sizes of tables and their range of values (specificity) of columns as it decides how to execute a query.

Thus the catalog needs to be designed in the most efficient way possible for accesses (by the DBMS modules) and for updates ... effectively creating a new user table in SQL causes a tuple entry into the system table describing tables and some other tuple modifications / entries also.

As SQL commands are executed, the catalog tables are updated automatically by the DBMS.

To illustrate the system catalog we use a worked example ... INFORMIX from a few versions ago when it (the DBMS) was simple ... we will work through all tables to illustrate the simplicity and beauty of it ... and to contrast w.r.t. complexity, we will look at ORACLE7 system catalog.

The Elmasri & Navathe book has a phantom catalog given as an ER diagram and as a table.



8-2. The Informix Catalog

INFORMIX had nine tables in its system catalog as follows:

TABLE NAME

DESCRIPTION

systables A description of all database tables - One tuple per database table

syscolumns

A description of all columns in all tables - One tuple per column per table

sysindexes

Description of all indexes on all tables - One tuple per index

sysabauth

Table level privileges for users

syscolauth

Column level privileges for users

sysdepend

How views depend on underlying base tables

syssynonym

List of synonym names for tables, if any created

sysusers Database level privileges for users

sysviews Definition of all views

Lets look at some tables in more detail ...

SYSUSERS

username char(8) user login id

usertype char(1) indicates DBA / resource / connect privileges

password char(8) encrypted password

Create unique index users on sysusers(username);

... Guarantees unique usernames



SYSTABLES

tabname char(18) table name

owner char(8) username of table creator

dirpath char(64) directory path for datafile

tabid integer internal number/code for table .. for efficiency

rowsize smallint number of bytes wide

ncols smallint number of columns

nindexes smallint number of indexes

nrows integer number of rows

created date date of creation

version integer table version number

tabtype char(1) table or view

audpath char(64) full pathname for audit file

Create unique index tabname on systables(tabname, owner);Create unique index tabid on systables(tabid);

... Guarantees unique table names per owner, and unique codes for tables

Notice how columns are made as "narrow" as possible to reduce page I/O.

And so on ...



8-3. The ORACLE7 Catalog

The ORACLE7 catalog is a monster composed of some base tables and a multitude of "data dictionary views". Here it is ... all 170 tables worth !

ALL_CATALOG ALL_COL_COMMENTS ALL-COL-PRIVS ALL_COL_PRIVS_MADEALL_COL_PRIVS_RECDALL_CONSTRAINTS ALL_CONS_COLUMNS ALL_DB_LINKS ALL_DEF_AUDIT_OPTS ALL_DEPENDENCIES ALL_ERRORS ALL_INDEXES ALL_IND_COLUMNS ALL_LABELS ALL_MOUNTED_DBS ALL_OBJECTS ALL_SEQUENCES ALL_SNAPSHOTS ALL_SOURCE ALL_SYNONYMS ALL_TABLES ALL_TAB_COLUMNS ALL_TAB_COMMENTS ALL_TAB_PRIVS ALL_TAB_PRIVS_MADE ALL_TAB_PRIVS_RECD ALL_TRIGGERS ALL_USERS ALL_VIEWS AUDIT_ACTIONS CAT CLU CODE_PIECES CODE_SIZE COLS COLUMN_PRIVILEGES DBA_2PC_NEIGHBORS DBA_2PC_PENDING DBA_AUDIT_EXISTS DBA_AUDIT_OBJECT DBA_AUDIT_SESSION DBA_AUDIT_STATEMENT DBA_AUDIT_TRAIL DBA_BLOCKERS DBA_CATALOG DBA_CLUSTERS DBA_CLU_COMMENTS DBA_COL_COMMENTS DBA_COL_PRIVSDBA_CONSTRAINTSDBA_CONS_COLUMNS DBA_DATA_FILES DBA_DB_LINKS DBA_DDL_LOCKS DBA_DEPENDENCIESDBA_DML_LOCKSDBA_ERRORSDBA_EXP_FILESDBA_EXP_OBJECTSDBA_EXP_VERSION

DBA_EXTENTSDBA_FREE_SPACEDBA_INDEXESDBA_IND_COLUMNSDBA_LOCKSDBA_OBJECTSDBA_OBJECT_SIZEDBA_OBJ_AUDIT_OPTSDBA_PRIV_AUDIT_OPTSDBA_PROFILESDBA_ROLE_PRIVSDBA_ROLESDBA_ROLLBACK_EGSDBA_SEGMENTSDBA_SEQUENCESDBA_SNAPSHOTSDBA_SNAPSHOT_LOGSDBA_SOURCEDBA_STMT_AUDIT_OPTSDBA_SYNONYMSDBA_SYS_PRIVSDBA_TABLESDBA_TABLESPACESDBA_TAB_COLUMNSDBA_TAB_COMMENTSDBA_TAB_PRIVSDBA_TRIGGERSDBA_TS_QUOTASDBA_USERSDBA_VIEWSDBA_WAITERSDBMS_ALERT_INFODBMS_LOCK_ALLOCATEDDEPTREEDICTDICTIONARYDICT_COLUMNSGLOBAL_NAMEIDEPTREEINDINDEX_HISTOGRAMINDEX_STATSLOADER_COL_INFOLOADER_CONSTRAINT_INFOLOADER_INDCOL_INFOLOADER_PARAM_INFOLOADER_TAB_INFOLOADER_TRIGGER_INFOLOADER_IND_INFOOBJPARSED_PIECESPARSED_SIZEPUBLIC_DEPENDENCYRESOURCE_COSTROLE_ROLE_PRIVSROLE_SYS_PRIVSROLE_TAB_PRIVSSEQSESSION_PRIVSSESSION_ROLES

SOURCE_SIZESTMT_AUDIT_OPTION_MAPSYNSYSTEM_PRIVILEGE_MAPTABLE_PRIVILEGESTABLE_PRIVILEGE_MAPTABSUSER_AUDIT_OBJECTUSER_AUDIT_SESSIONUSER_AUDIT_STATEMENTUSER_AUDIT_TRAILUSER_CATALOGUSER_CLUSTERSUSER_CLU_COLUMNSUSER_COL_COMMENTSUSER_COL_PRIVSUSER_COL_PRIVS_MADEUSER_COL_PRIVS_RECDUSER_CONSTRAINTSUSER_CONS_COLUMNSUSER_DB_LINKSUSER_DEPENDENCIESUSER_ERRORSUSER_EXTENTSUSER_FREE_SPACEUSER_INDEXESUSER_IND_COLUMNSUSER_OBJECTSUSER_OBJECT_SIZEUSER_OBJ_AUDIT_OPTSUSER_RESOURCE_LIMITSUSER_ROLE_PRIVSUSER_SEGMENTSUSER_SEQUENCESUSER_SNAPSHOTSUSER_SNAPSHOT_LOGSUSER_SOURCEUSER_SYNONYMSUSER_SYS_PRIVSUSER_TABLESUSER_TABLESPACESUSER_TAB_COLUMNSUSER_TAB_COMMENTSUSER_TAB_PRIVSUSER_TAB_PROVS_MADEUSER_TAB_PRIVS_RECDUSER_TRIGGERSUSER_TS_QUOTASUSER_USERSUSER_VIEWS



Chapter 9. Views

This chapter covers views.

9-1. View Definition

9-2. View Examples

Sources: Elmasri & Navathe Chapter 7 (part of SQL) pp 215 - 219

9-1. View Definition

A view is a named, derived table, like a "window".

Base tables are actually stored physically and exist as data on disk but views are virtual data ... they do not exist separately. Their information content is dynamically derived.

A DBMS schema is made up of base tables & views and SQL DML commands are executed on base tables & views.

Data appearing in a view does not exist separately but appears to.

The definition of views is in terms of base tables or in terms of other views and the view definition is stored in the system catalog (check out the system catalog entries for INFORMIX and for ORACLE that we saw earlier).

Views provide data "windows".

A single view may show aggregated (derived) data or actual data as a virtual table.

Views permit access to sensitive data by allowing users to see only aggregates or summaries (as views) and then apply security privileges to those views.

Any SQL SELECT can be executed on a view.

UPDATE, DELETE and INSERT commands can be executed on views though these operations can be limited.

Views are dynamic windows, not snapshots, so as data changes, so do views so they are always up to date.

Syntax is:

CREATE VIEW viewname[(colname [,colname]*)] AS subquery[WITH CHECK OPTION] ;

Note that if the colname attribute is not present we inherit attribute names.

The with check option (WCO) is needed if view is updatable and updates are rejected if they violate the view definition condition … interesting … they are allowed if WCO is not included.



Cannot create an index on a view, cannot use UNION or ORDER BY in the subquery (ORDER BY would not make sense, no UNION is a quirk).

DROP VIEW viewname ;

The above statement drops a view and any other views defined on this view ... cascades.

For executing queries, an R.DBMS will:

Try to combine the view definition and the user’s query into one query if possible for overall query optimisation but this is expensive if the view definition query is complex.

View materialisation is where the DBMS will create temporary tables reflecting view content and immediately usable by other instances of that view.

SELECTs on views are straightforward.

INSERTs put NULLs in base table columns not in the view definition and this is not allowed unless the base table columns allow NULLs.

Column subsets are theoretically updatable iff they contain the primary key.

Cannot update a database through a view if the view definition involves JOINs, GROUP BY, DISTINCT or aggregate functions.

Cannot alter a view ... drop it and create another.

If a column is dropped from a base table which is involved in a view definition then the view is invalidated ... older R.DBMS discovered this only on subsequent access (i.e. no integrity checking).

In summary, views are important:

in formulating difficult queries though this role is underestimated

in allowing partial queries to be re-used

in providing security by hiding data

Note: Many of the entries in the INFORMIX and in the ORACLE system catalogs are actually views on base tables within the system catalog



9-2. View Examples

CREATE VIEW BUSINESS-STUDIES-STUDENTSAS SELECT S#, SNAME, SCOURSEFROM SWHERE SCOURSE = 'BBS'WITH CHECK OPTION;

S# SNAME SCOURSEAGE

BUSINESS-STUDIES-STUDENTS

1234 Givins BBS 20 S# SNAME SCOURSE

2345 Irwin MCA 22 1234 Givins BBS

3456 Babb BBS 20 3456 Babb BBS

4567 Kenna BBS 21 4567 Kenna BBS

5678 Cascarino CA 20

6789 Keane CS 22


S# SNAME SCOURSEAGE







6789 Keane CS 22

INSERT INTO BUSINESS-STUDIES-STUDENTSVALUES (1111, McGrath, BBS);




S# SNAME SCOURSE AGE BUSINESS-STUDIES-STUDENTS





5678 Cascarino CA 20 1111 McGrath BBS

6789 Keane CS 22

1111 McGrath BBS NULL



S# SNAME SCOURSEAGE







6789 Keane CS 22

INSERT INTO BUSINESS-STUDIES-STUDENTSVALUES (1234, McKenna, BBS);

... violates primary key in the view.




S# SNAME SCOURSEAGE







6789 Keane CS 22


... violates primary key of tuple not in view.


S# SNAME SCOURSEAGE







6789 Keane CS 22

INSERT INTO BUSINESS-STUDIES-STUDENTSVALUES (1111, McGrath, CA);

... violates the view definition constraint

... what happens if "WITH CHECK OPTION" is left out ?



CREATE VIEW SUPPLIER-PART-SMALL-SHIPMENTSAS SELECT SNAME, PNAMEFROM S, SP, PWHERE S.S# = SP.S#AND SP.P# = P.P#AND SP.QTY > 100WITH CHECK OPTION;

Note : Error in dialogue. Should say greater than, not less than.

SNAME PNAME

Smith Nut

Smith Bolt

Smith Screw

Jones Nut

Jones Bolt

Blake Bolt

Does the "WITH CHECK OPTION" make sense ?



CREATE VIEW PARTS-AMOUNTSAS SELECT SP.P#, P.PNAME, SP.QTYFROM P, SPWHERE P.P# = SP.P#;

Note : Error in dialogue. Should say P.P# = SP.P# as above.

P# PNAME QTY

P1 Nut 300

P2 Bolt 200

P3 Screw 400

P1 Nut 300

P2 Bolt 400

P2 Bolt 200

There are duplicates as far as the query in the view definition is concerned, so they are eliminated (greyed out in diagram).

Syntactically, this query is correct but looking at the supplier-parts table it appears it should contain the total amount of parts shipped by a supplier, not just one shipment, but the example on-screen does not do this.



CREATE VIEW SUMMARY (S#, TOTQRY)AS SELECT S#, SUM (QTY)FROM SPGROUP BY S#;

SP P# QTY SUMMARY

S1 P1 300 S# TOTQRY

S1 P2 200 S1 900

S1 P3 400 S2 700

S2 P1 300 S3 200

S2 P2 400

S3 P2 200



Chapter 10. Database Design & Normalisation

This chapter covers database design and normalisation.

10-1. Introduction to Database Design

10-2. 3NF, 2NF and 1NF

10-3. BCNF

10-3-1. BCNF Example 1




10-4. 4NF

10-5. 5NF

10-6. Database Design

Sources: Elmasri & Navathe pp 391 - 445

10-1. Introduction to Database Design

An important part of database design is deciding on a suitable logical structure or schema to implement ... called database design.

Considering supplier parts example (S,P,SP) there is a feeling of correctness.



Normalisation theory is a formalism of simple ideas with a practical application in logical database schema design.

Normalisation theory should allow us to recognise relations with undesirable properties, tell us what is "wrong" & how to "correct" it.

Normalisation theory is built around normal forms - each normal form has a set of satisfiable criteria.

Normal forms exist in a hierarchy:

1NF -> 2NF -> 3NF -> BCNF -> 4NF -> PJ/NF (5NF)

Codd defined 1NF, 2NF, 3NF in 1972. Note: Monologue says 1992, but 1972 is correct.

3NF had inadequacies so it was revised in 1974 by Boyce/Codd (BCNF).

1977 Fagin defined 4NF, 1979 defined 5NF.

6NF,7NF ?... dependencies theory suggests there may be higher NFs but not practicable in database environment.

DB designers should aim for higher NFs but this is not law - just recommended as normalisation simply provides guidelines for database design.

There are often good reason for not using normalisation theory.

In order to describe the various normal forms we must first introduce some definitions:

Functional DependencyGiven relation R, attribute Y of R is functionally dependent on X of R, R.X -> R.Y, or R.X functionally determines R.Y ...

... iff each R.X value has associated with it precisely one R.Y value, where X and/or Y may be composite.

S.SNAME, S.STATUS and S.CITY are each functionally dependent on S.S#

If R.X is a candidate key or if R.X is the primary key, then all R.Y must be functionally dependent on R.X

In SP we have a composite primary key so

SP.(S#,P#) -> SP.QTY

There is no requirement in the definition of functional dependence that R.X be a candidate key, thus:

R.X -> R.Y iff whenever 2 tuples of R.X are the same then the corresponding R.Y values are also the same.

R.Y is fully functionally dependent on R.X iff it is functionally dependent on R.X and not fully functionally dependent on any subset of R.X



S.(S#,STATUS) -> S.CITY is true but not full functional dependence as S.S# -> S.CITY

If R.X -> R.Y but not fully then R.X must be composite

A functional dependency diagram is used to represent graphically, full functional dependencies … for example:

Functional dependence is a semantic notion to do with understanding what the data means rather than because of the properties of a particular data set at a given time.

10-2. 3NF, 2NF and 1NF

Definition 1: 3 NFA relation R is in 3NF iff the nonkey attributes of R are mutually independent and fully dependent on the primary key of R

Nonkey in this sense means not part of the primary key and mutually independent means none of the attributes are functionally dependent on any others.

P(P#,PNAME,COLOUR,WEIGHT) is 3NF because we can change nonkey attributes independently and all are functionally dependent on P#.

Definition 2: 3NFA relation R is in 3NF iff each tuple consists of a primary key to identify a real entity plus 0 or more mutually independent attribute values to describe that entity.

R is in 1NF iff all underlying domains are atomic.



In order to show 2NF let us unite S and SP to get:

FIRST(S#,STATUS,CITY,P#,QTY)

We also introduce a new constraint such that STATUS is functionally dependent on CITY, eg

London suppliers have status 10, alwaysParis suppliers have status 20, alwaysMunich suppliers have status 20, always etc ...

Primary key is (S#,P#) and the functional dependency diagram is ...

Definition 3: 3NFA relation in 3NF has arrows out of primary key only.

In FIRST, additional arrows cause trouble as the nonkey attributes are not mutually independent and not all attributes are dependent on the primary key.

What are the difficulties with this relation anyway ?

The problem with FIRST is that it stores redundant information which can lead to update anomalies as follows:

INSERT: Cannot insert the fact that a supplier exists until that supplier actually makes a shipment

DELETE: Deleting the last tuple based on S#,P# could lose the information that S3 is located in CITY

UPDATE: CITY values occur for each shipment thus an update of CITY is unnecessarily expensive.

One solution is to replace FIRST by:



SECOND(S#,STATUS,CITY) and SP(S#,P#,QTY)

This yields the following FD diagram:

This is appealing as follows:

INSERT: can enter the fact that S5 is in ATHENS without S5 actually having to make a shipment

DELETE: can delete shipment tuples and not lose location information

UPDATE: information appears once only thus updating is more efficient

10-3. BCNF

3NF has the following inadequacies in that it cannot handle cases of relations with:

multiple candidate keys where

candidate keys are composite

candidate keys overlap

The above combination of events do not occur very often in practice, but they are not contrived and they do exist.

BCNF was defined to address the above and the definition of BCNF is stronger than that of 3NF.

A functional determinant is an attribute on which some other attribute is fully functionally dependent.

Definition of BCNFA relation R is in BCNF iff every determinant is a candidate key ... not just primary keys!

This is a simpler definition than 3NF with no references to 1NF or 2NF or transitive dependencies.

Now for some confusion ...



Textbooks sometimes differ in their definitions of 3NF and whether a relation in 3NF is also in BCNF !

The precise and exact definitions do not assume that each R has exactly one CK, the PK, as is done in most textbooks, and those definitions are as follows:

2NF == 1NF and each non-prime attribute is FFD on each CK

3NF == 2NF and none of the non-prime attributes are transitively dependent on any CKs

Here non-prime is not part of any candidate key.

However many textbooks simplify the definitions by assuming that each R has one CK which is the PK !

Any given relation can be non-loss decomposed into an equivalent collection of BCNF relations.

FIRST,SECOND are not in BCNF

SP,SC,CS are in BCNF

Lets illustrate BCNF with a set of examples, some in BCNF, some not.




S(S#, SNAME, STATUS, CITY)

S# and SNAME are both candidate keys, i.e. numbers and names of suppliers are both unique.

STATUS and CITY are mutually independent, with the FD diagram ...

S is in BCNF as the only determinants are candidate keys.

In S, candidate keys are atomic and thus non-overlapping.


SSP(S#, SNAME, P#, QTY)

Candidate keys are (S#, P#) and (SNAME, P#), say 1st is primary key with FD diagram ...

Not in BCNF as 2 determinants, S#,SNAME are not candidate keys so the table will contain redundancies and have certain update anomalies.

SSP is in 3NF because that definition does not require an attribute to be fully dependent on the primary key if it itself is a component of some alternate key.

Solution: break SSP into 2 projections either:

SS(S#,Sname) and SP(S#,P#,QTY)

or



SS(S#,Sname) and SP(Sname,P#,QTY)

All of these are in BCNF.


SJT(S, J, T)

Here student S is taught subject J by teacher T with the following constraints:

1. For each subject each student is taught by only 1 teacher2. Each teacher teaches only 1 subject3. Each subject taught by several teachers

This is a bit like secondary school, with the following FD diagram ...

Here we have two overlapping candidate keys (S, J) and (S, T) and SJT is in 3NF but it is not in BCNF so we could get update anomalies caused by T being a determinant but not a CK (Candidate Key).

Solution: replace SJT by 2 projections:

ST(S, T) and TJ(T, J)




EXAM(S, J, P)

Here student S was examined in subject J and achieved rank position P in the class with the constraint that there are no ties for positions.

This yields the following FD diagram ....

Here we have composite and overlapping candidate keys (S, J) and (J, P) but just because we have such a situation does not mean we need to normalise because EXAM is already in BCNF !



10-4. 4NF

Consider the following ...

course (C) taught by one of a set of teachers (T)

for each course there is a repeating set of recommended textbooks (X)

for each course there may be any numbers of teachers and any numbers of recommended texts

teachers and texts are independent

teachers can be associated with any number of courses

This corresponds closely with a large secondary school with DoE recommended textbooks, teachers doubling up and many teachers.

We could "flatten" this information into a 1NF relation called CTX.

CTX

Course Teacher Textbook

L.C. Math Smith H+M 4


L.C. French Kelly Folens 1

L.C. English Doyle Hamlet

L.C. Math Doyle H+M 4


CTX








There are no FDs in data so no basis for decomposition but there is still some redundancy in CTX.

If (C1, T1, X1) and (C1, T2, X2) then there must also be the following tuples present ... (C1, T1, X2) and (C1, T2, X1) !

This is redundancy and thus we can have update anomalies.



For example, if we add (Geography, Ryan, Holland) and (Geography, Scott, Gaines) then we must also add (Geography, Ryan, Gaines) and (Geography, Scott, Holland).

Examining the criteria for normal forms however we find CTX is (trivially) in BCNF as the 3 attributes make up the sole CK !

It would be desirable to decompose CTX into :

CT(Course, Teacher) and CX(Course, Text)

Both of these are in BCNF as both are "all key".

CTX








So CTX would be represented as :

CT CX

Course Teacher Course Text

L.C. Math Smith L.C. Math H+M 4

L.C. French Kelly L.C. Math H+M 5

L.C. English Doyle French Folens 1

L.C. Maths Doyle English Hamlet

This decomposition is based on Fagin's multi-valued dependencies (MVDs).

course ->-> teachercourse ->-> text

A course does not have a single corresponding teacher, it has a well-defined set of teachers and for a course c and text x the set of teachers depends on the value of c, independent of x.

Definition of 4NFA relation R is in 4NF if it is in BCNF and all MVDs are FDs.

CTX is not in 4NF; CT, CX are in 4NF … 4NF is more desirable as it eliminates redundancies

For R with attributes A, B & C (which may be composite !)

R.A ->-> R.B



if the set of R.B values match (Avalue, Cvalue) in R and this depends on A, independent of C

R must have at least 3 attributes and for R(A, B, C) then R.A ->-> R.B holds iff

R.A ->-> R.C also holds

MVDs always go in pairs

If R.A ->-> R.B | R.C then R can be non-loss decomposed into R1(A, B) and R2(A, C)

10-5. 5NF

Some relations cannot be non-loss decomposed by projection into 2 relations but can be composed by projection into 3+ relations ... called n-decomposable for n > 2.

In reading about 5NF I have never found a non-contrived example to illustrate 5NF because 5NF is more theoretical than real ... anyway here is an example:

SPJ is a relation about suppliers, parts and projects.

1. Smith supplies wrenches2. wrenches are used in Block23. Smith supplies Block2

If 1,2 & 3 hold then Smith supplies wrenches to Block2 also holds as true.

Normally this implication does not hold, but if it does we call it a JOIN dependency and SPJ is a JD over (SP,PJ,JS) and should be decomposed into 3 relations yielding three relations, all in 5NF.

SPJ is not in 5NF because it has a join dependency but discovering such JDs is not easy ... this is because FDs and even MVDs have a straightforward real-world interpretation whereas a JD does not.

If R is in 5NF then it is also in 4NF.



10-6. Database Design

Database design is all about designing a schema of tables which captures all information needs from the portion of the real world being modelled, in such a way that no unnecessary redundancies are stored which could lead to update anomalies.

Reasoning about the normal forms of tables in a schema helps us determine if update anomalies can occur in theory.

Database design is a give-and-take task, fluid, revised continuously as users’ needs change and the information being modelled changes.

A database design is never complete, it is always evolving.

The task of database design is separate but related to the task of the DBA.

If the design of a database has commenced with the construction of an E-R diagram, then this can be used to determine a first version of the relational schema, but only a first version.

Turning E-R entities into relational tables is easy

Turning E-R 1:1 relationships into tables is also easy by storing the PK of one entity as an FK attribute of the other ... which one to embed in the other affects performance of queries, choice of database designer

Turning E-R 1:N relationships into tables is done by placing the PK of the relation representing the parent entity as a FK in the relation representing the child entity ... unlike 1:1, it does matter which is FK embedded in the other.

Turning E-R M:N relationships into tables is done by creating an additional table, an intersection relation, to represent the relationship itself ... i.e. decompose the M:N into two 1:N relationships. The PK of the new relation is the combination of PKs of its "parents".

Representing recursive relationships, which can be 1:1, 1:N or N:M, is done by embedding key for one in itself (1:1 and 1:N) or creating an additional table (N:M) ... so the fact that it is recursive is actually not important.

Having gone through the effort of an E-R modelling exercise and then the generation of a first approximation at a database schema, this first version may then be refined or de-normalised.

Given R in 1NF and FDs, MVDs and JDs, we systematically reduce R to a collection of smaller relations which are "more desirable", by taking projections in order to eliminate redundancy and the possibility of update anomalies.

But these are guidelines only and don't always have to be followed … often we want to de-normalise a database design.



Chapter 11. Database and the Web

This chapter covers applications of databases in the Internet.

11-1. Background Databases in the Web

11-2. JDBC Introduction

11-3. JDBC Tutorial

11-4. Databases and the Web - the Future

Sources: Elmasri & Navathe Navathe (1999) Chapter 27., Campione, Walrath The Java Tutorial

11-1. Background Databases in the Web

A simple architecture:

Client/server architecture

Information is stored in publicly accessible files on machines called Web servers

Files are encoded in HTML

Files are identified by URLs

Data (files) is communicated using HTTP

A three-tiered architecture:

Client (browser) - middleware (CGI) - backend (database)

A Common Gateway Interface (CGI) acts as the middleware between a client and a database at the back end.

CGI software executes programs/scripts to obtain dynamic information (instead of static file content)

Typical CGI languages

scripts: Perl, Tcl

The main disadvantage of this approach is that for each user request the Web server starts a new process, which, in case of a database backend, then connects to the database. At the end of the request, the connection is closed and the process terminates.

programs: Java (JDBC)

JDBC (and Java servlets) should provide a more efficient platform, without the need for time-consuming additional processes and database connections.



Database content can be displayed using a Web browser.

The presentation can be formulated in HTML.

Here is the table s from the Supplier/Parts example:

SNO SNAME STATUS CITY

S1 Smith 20 Paris

S2 Jones 10 Paris

S3 Blake 30 Rome

The HTML code is here:

<table align=center border=2 cellpadding=2 bgcolor=white>

<tr bgcolor=grey>

<td>SNO</td>

<td>SNAME</td>

<td>STATUS</td>

<td>CITY</td>

</tr>

<tr>

<td>S1</td>

<td>Smith</td>

<td>20</td>

<td>Paris</td>

</tr>

<tr>

<td>S2</td>

<td>Jones</td>

<td>10</td>

<td>Paris</td>

</tr>

<tr>

<td>S3</td>

<td>Blake</td>

<td>30</td>

<td>Rome</td>

</tr>

</table>



11-2. JDBC Introduction

JDBC is Sun's solution to the inefficiency of CGI-scripts connecting to databases.

JDBC provides facilities (Java JDBC API) to

connect to the database (Java class Connection)

send an SQL statement to the database (Java class Statement)

process a result (Java class ResultSet)

see Java JDBC API definition for more details.

The Java code is DBMS transparent, which means that any code needed to establish and maintain the connection to the database is hidden.

JDBC drivers, called by methods of the Java classes Connection and Statement, handle the connection management.

JDBC drivers for particular database management systems need to be installed, or a JDBC-ODBC bridge needs to be loaded if the connection to the database shall be made via Microsoft's ODBC mechanism.

11-3. JDBC Tutorial

This tutorial shall show how to connect from a server to a database.

The tutorial is based on Java.

Content:

Database Access

Using the dbWrapper class

Execute the program

Connecting to an Oracle DB

The first three sections describe how to connect to a database running under Windows NT.

The fourth section describes how to connect to an Oracle DB running under Unix.

Database Access

The interface to the database shall be realised in a class called dbWrapper. The dbWrapper class provides three methods:

Open: opens the connection to the database

Select: executes a query or an update, i.e. executes an SQL statement and prints the result

Close: closes the connection to the database



There is also a private method called printResultSet which prints the result of e.g. select statements. This method is called by the Select method.

import java.sql.*;

The java.sql package provides means to execute queries or updates.

Global variables of the class are

a statement object (which will allow us to pass SQL statements to the database),

an object representing the connection to the database,

a URL string (denoting the URL of the database server),

and strings for the username and password for the database.

class dbWrapper

{

Statement stmt;

Connection con;

String strUrl;

String strUserName;

String strPassword;

The constructor assigns user name and password. It also constructs the URL string consisting of a protocol part, here jdbc:odbc: which means that JDBC Java DataBase Connectivity interfaces Microsoft's ODBC Open DataBase Connectivity which then connects to the database server - whose Internet address is denoted by the DSN Data Source Name strDSN.

public dbWrapper(String strDSN)

{

strUrl = "jdbc:odbc:" + strDSN;

strUserName = "guest";

strPassword = "guest";

}

A driver is needed which bridges between the Java DB connectivity and Microsoft's ODBC. This driver is loaded from Sun's site. Then the connection to the database, specified by URL, user name and password, is established. Finally, a statement object is created, which will allow us to pass SQL statements to the database



public void Open()

{

...

// Load the jdbc-odbc bridge driver

Class.forName("sun.jdbc.odbc.JdbcOdbcDriver");

// Attempt to connect to a driver.

con = DriverManager.getConnection( strUrl, strUserName,

strPassword );

// Create a Statement object so we can submit

// SQL statements to the driver

stmt = con.createStatement();

...

}

The Select method allows us to execute an SQL select (or update) statement. strQuery is a string containing the query (or update). strQuery is executed and a resultSet is returned. This result set - in case of a SELECT statement a set of tuples (records) - is then processed using the method printResultSet(). Then, the result set is closed (i.e. discarded).

public void Select( String strQuery )

{

...

// Submit a query, creating a ResultSet object

ResultSet rs = stmt.executeQuery( strQuery );

// Display all columns and rows from the result set

printResultSet (rs);

rs.close();

...

}



Close closes the connection to the database.

public void Close()

{

...

stmt.close();

con.close();

...

}

The method printResultSet prints the result set which has been returned by the statement execution. For each record in the result set (obtained by rs.next()), all attribute values are printed in a for-loop.

private static void printResultSet(ResultSet rs) throws SQLException

{

int numCols = rs.getMetaData().getColumnCount();

while ( rs.next() )

{

for (int i=1; i<=numCols; i++)

{

System.out.print(rs.getString(i) + " | " );

}

System.out.println();

}

}

}

Using the dbWrapper class

Setup ODBC under Windows NT:

To open the database connection, you have to create a dbWrapper object.

The parameter is a DSN (data source name). This name has to be defined on your machine. For this, start the ODBC program (Settings -> Control Panel -> ODBC) under Windows NT.



Define a user DSN connecting to the database management system of your choice. Click on 'Add'. You need to select the appropriate driver (e.g. for MS SQLServer or MS Access), select SQL Server for this application here.

Choose a name (it should be 'TestDB' if you want to use this program here). Enter the name (or IP address) of the server on which your database is running (it should be 'gobi' if you want to connect to a MS SQL Server running in the School of Computer Applications). Choose the database you want to access (i.e. enter the name) using the Options-menu. Sometimes, leaving the field empty (i.e. using the default) will work.

If you want to connect to MS Access on your local machine, you have to choose the corresponding driver, give it a name, and your machine as the server.

Suppose the DSN you have defined is "TestDB", then the following establishes the connection.

dbWrapper myDB = new dbWrapper("TestDB");

myDB.Open();

There is a book-example in the DB. A sample query which could be executed is "SELECT * FROM Authors":

strSQLQuery = "SELECT * FROM Authors";

strResult = myDB.Select(strSQLQuery);

Execute the program

Here is the full source code of the dbWrapper class:

// dbWrapper Class

import java.sql.*;

class dbWrapper

{

Statement stmt;

Connection con;

String strUrl;

String strUserName;

String strPassword;



public dbWrapper(String strDSN)

{

// the DSN for the Db connection

strUrl = "jdbc:odbc:" + strDSN;

strUserName = "guest";

strPassword = "guest";

}

public void Open()

{

try

{

// Load the jdbc-odbc bridge driver

Class.forName("sun.jdbc.odbc.JdbcOdbcDriver");

// Attempt to connect to a driver.

con = DriverManager.getConnection( strUrl, strUserName,

strPassword );

// Create a Statement object so we can submit

// SQL statements to the driver

stmt = con.createStatement();

}

catch (SQLException ex)

{

while (ex != null)

{

System.out.println("SQL Exception: " + ex.getMessage() );

ex = ex.getNextException();

}

}

catch (java.lang.Exception ex)

{

ex.printStackTrace();

}

}



public void Select( String strQuery )

{

try

{

// Submit a query, creating a ResultSet object

ResultSet rs = stmt.executeQuery( strQuery );

// Display all columns and rows from the result set

printResultSet(rs);

rs.close();

}


{

while (ex != null)

{

System.out.println ("SQL Exception: " + ex.getMessage () );

ex = ex.getNextException ();

}

}

}

public void Close()

{

try

{

stmt.close();

con.close();

}


{

while (ex != null)

{

System.out.println("SQL Exception: " + ex.getMessage () );

ex = ex.getNextException();

}

}

}



private static void printResultSet(ResultSet rs) throws SQLException

{

int numCols = rs.getMetaData().getColumnCount();

while ( rs.next() )

{

for (int i=1; i<=numCols; i++)

{

System.out.print(rs.getString(i) + " | " );

}

System.out.println();

}

}

}

Connecting to an Oracle DB

This should explain how to connect to an Oracle database running under Unix.

The difference to the previous program is marginal.

Two changes are needed:

All you have to do is to use another URL, username and password

strUrl = "jdbc:oracle:thin:@pisang:1521:car";

strUserName = "testStudent";

strPassword = "test";

and to load another driver

// Register Driver

DriverManager.registerDriver(new oracle.jdbc.driver.OracleDriver());

instead of the Class.forName("sun.jdbc.odbc.JdbcOdbcDriver")-call. Here, we need a genuine Oracle driver instead of Sun's JDBC-ODBC bridge driver.



You can make one more change. The parameter of the dbWrapper constructor is not needed. This information is only important for setting up a connection under Windows. So, you can remove the parameter (or ignore it).

If the connection is successfully established, you can query the tables of the Supplier/Parts database (S, SP, P) and of the Elmasri/Navathe Company database (Employee, Department, etc.).

11-4. Databases and the Web - the Future

Electronic Commerce:

Database support is increasingly important for the emerging Electronic Commerce technologies

Both business-to-consumer (B2C) and business-to business (B2B) eCommerce rely on data managed using database management systems, which can be accessed via the Internet.

In the future, we expect to see:

the convergence of Web and object technologies, e.g. the Document Object Model DOM, which allows us to see documents as objects.

new languages more powerful than HTML, e.g. XML - the eXtensible Markup Language - allows us to define documents in a presentation-independent way and to exchange data independently of the system used to store the data.

Both developments will have an effect on what kind of data is stored in the databases and how it is stored.


CA218CourseNotes.doc

Documents

Transcript of CA218CourseNotes.doc