CA218CourseNotes.doc
-
Upload
databaseguys -
Category
Documents
-
view
127 -
download
4
Transcript of CA218CourseNotes.doc
![Page 1: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/1.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
CA218CA218Introduction toIntroduction to
DatabasesDatabases
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 1
![Page 2: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/2.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
Chapter 1. Information Systems................................................................................11a. Information Systems Introduction..............................................................................................................................11b. Information Systems & DBMS..................................................................................................................................2
Chapter 2. Database Overview...................................................................................32-1-1a. Database Components.........................................................................................................................................42-1-1b. DBMS Data.........................................................................................................................................................42-1-2. DBMS Hardware...................................................................................................................................................52-1-3. DBMS Software....................................................................................................................................................62-1-4a. DBMS Users I.....................................................................................................................................................72-1-4b. DBMS Users II....................................................................................................................................................72-2. What Data for a DBMS?..........................................................................................................................................82-3a. Models of Data........................................................................................................................................................92-3b. Data Model Differences........................................................................................................................................102-3c. DBMS Examples..................................................................................................................................................102-4a. Why use a DBMS ?..............................................................................................................................................112-4b. Specific Reasons for DBMS.................................................................................................................................112-4c. Why not a DBMS?................................................................................................................................................122-5. Three Level Architecture........................................................................................................................................122-5-1. The External Level..............................................................................................................................................132-5-2. The Conceptual Level.........................................................................................................................................132-5-3. The Internal Level...............................................................................................................................................132-6a. The Database Administrator I...............................................................................................................................142-6b. The Database Administrator II.............................................................................................................................14
Chapter 3. Storage Structures..................................................................................153-1. Why Storage Structures?........................................................................................................................................153-2a. Hardware Features of Disks..................................................................................................................................163-2b. Disk and File Managers........................................................................................................................................173-2c. Clustering on Disk Surfaces.................................................................................................................................173-3a. Using Index Files I................................................................................................................................................183-3b. Using Index Files II..............................................................................................................................................193-4a. Hashing I...............................................................................................................................................................193-4b. Hashing II.............................................................................................................................................................213-4c. Hashing III............................................................................................................................................................22
Chapter 4. Entity-Relationship Data Modeling......................................................234-1. E-R Introduction.....................................................................................................................................................234-2a. E-R Definitions I...................................................................................................................................................244-2b. E-R Definitions II.................................................................................................................................................244-2c. E-R Definitions III................................................................................................................................................254-3a. E-R Notation I.......................................................................................................................................................254-3b. E-R Notation II.....................................................................................................................................................274-3c. Cardinality Ratios.................................................................................................................................................284-3d. Recursive Relationships.......................................................................................................................................294-3e. Properties of Relationships...................................................................................................................................294-3f. Ternary Relationships...........................................................................................................................................304-3g. Additional Notation..............................................................................................................................................304-4. E-R Principles.........................................................................................................................................................314-5a. E-R Example I......................................................................................................................................................314-5b. E-R Example II.....................................................................................................................................................32
Chapter 5. Relational Model of Data.......................................................................335-1a. Basic Modelling....................................................................................................................................................345-1b. Relational Model Overview..................................................................................................................................355-2. Relational Tables....................................................................................................................................................365-2. Relational Tables....................................................................................................................................................375-3a. Relational Model Integrity Basics........................................................................................................................39
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 2
![Page 3: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/3.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
5-3b. Relational Model Integrity....................................................................................................................................405-4a. Relational Algebra Operators...............................................................................................................................425-4a1. The SELECT Operation......................................................................................................................................435-4a2. The PROJECT Operation...................................................................................................................................445-4a3. The PRODUCT Operation..................................................................................................................................465-4a4. The UNION Operation.......................................................................................................................................475-4a5. The INTERSECTION Operation........................................................................................................................485-4a6. The DIFFERENCE Operation............................................................................................................................495-4a7. The JOIN Operation............................................................................................................................................505-4a8. The DIVIDE Operation......................................................................................................................................515-4b. Relational Algebra................................................................................................................................................525-4c. Relational Expressions..........................................................................................................................................525-5. Relational Calculus.................................................................................................................................................54
Chapter 6. SQL..........................................................................................................556-1. SQL Background & Standards...............................................................................................................................556-2. SQL2 Schemas.......................................................................................................................................................576-3. SQL DDL...............................................................................................................................................................576-4. SQL SELECT Statement........................................................................................................................................646-5. SQL INSERT, DELETE and UPDATE.................................................................................................................746-6. Non-Standard SQL.................................................................................................................................................77
Chapter 8. The System Catalog................................................................................788-1. The System Catalog................................................................................................................................................788-2. The Informix Catalog.............................................................................................................................................808-3. The ORACLE7 Catalog..........................................................................................................................................82
Chapter 9. Views........................................................................................................839-1. View Definition......................................................................................................................................................839-2. View Examples.......................................................................................................................................................85
Chapter 10. Database Design & Normalisation......................................................9110-1. Introduction to Database Design..........................................................................................................................9110-2. 3NF, 2NF and 1NF...............................................................................................................................................9310-3. BCNF....................................................................................................................................................................9510-3-1. BCNF Example 1..............................................................................................................................................9710-3-2. BCNF Example 2..............................................................................................................................................9710-3-3. BCNF Example 3..............................................................................................................................................9810-3-4. BCNF Example 4..............................................................................................................................................9910-4. 4NF.....................................................................................................................................................................10010-5. 5NF.....................................................................................................................................................................10210-6. Database Design................................................................................................................................................103
Chapter 11. Databases and the Internet................................................................10411-1. Introduction........................................................................................................................................................10411-2. JDBC Introduction..............................................................................................................................................10611-3. JDBC Tutorial.....................................................................................................................................................10611-4. Databases and the Web - the Future...................................................................................................................114
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 3
![Page 4: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/4.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
Chapter 1. Information Systems
This introductory chapter describes the role that a Database Management System (DBMS) plays in terms of other information systems.
1a. Information Systems Introduction
1b. Information Systems and DBMS
Sources: Elmasri & Navathe pp 1-6
1a. Information Systems Introduction
" A computer-based 1 information system retrieves 2 information 3 from its database 4 in response to a users query 5 ".
1. Manual v computer based
2. Retrieve, store, modify, delete ... always 4 DML commands
3. Computerised information could be ...
structured numeric/alpha
free text
voice
image
rules
others ...
4. Database is a repository which is big and organised
5. User query:
Precise or vague information need
Expressed precisely or vaguely
Interactive or batch execution / retrieval
Seeking specific information or aggregate
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 1
![Page 5: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/5.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
1b. Information Systems & DBMS
So where does a DBMS fit in:
Interactive or host query
Unambiguous statement
Precise query
Retrieved information is specifically stored or aggregated
Well-structured information, text and multimedia are stored as bit strings
Query is Boolean combination of predicates
Exact matching
Formal schema
DBMS also provides security, data independence, persistence, concurrency, recovery and backup
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 2
![Page 6: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/6.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
Chapter 2. Database Overview
This chapter presents an overview of databases, and is composed of six sections:
2-1-1. Database Components
2-1-2. DBMS Hardware
2-1-3. DBMS Software
2-1-4a. DBMS Users I
2-1-4b. DBMS Users II
2-2. What data for a DBMS ?
2-3a. Models of Data
2-3b. Data Model Differences
2-3c. DBMS Examples
2-4a. Why use a DBMS ?
2-4b. Specific Reasons for DBMS
2-4c. Why not a DBMS ?
2-5. Three Level Architecture
2-5-1. The External Level
2-5-2. The Conceptual Level
2-5-3. The Internal Level
2-6a. The Database Administrator I
2-6b. The Database Administrator II
Sources: Any database textbook overview / introduction.
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 3
![Page 7: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/7.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
2-1-1a. Database Components
What makes up a DBMS?
DBMS stores, maintains and provides access to data.
In this overview of DBMS components we look at:
Data
Hardware
Software
Users
2-1-1b. DBMS Data
Range of machine sizes from PC to mainframe, isolated or networked.
DBMS runs on entire range of platforms.
Single and multi-user, shared access, maintaining integrity of data.
Users concerned with overlapping subsets of total data meaning data perceived by different users in different ways.
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 4
![Page 8: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/8.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
Look at the total DCU data ... consists of ...
Students have views consisting of ...
Library has views consisting of ...
Finance has views consisting of ...
Inherent feature of DBMS data is that it is shared.
2-1-2. DBMS Hardware
Conventional machines vs. specialist database machines.
Mostly general purpose machines with DBMS as a conventional software application.
Accelerated chips have been proposed but not commercially successful.
Database machines do exist ... expensive, limited market.
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 5
![Page 9: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/9.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
2-1-3. DBMS Software
DBMS is an application program sitting between user and data.
DBMS handles all interactions between the two.
DBMS shields users from each other and from unauthorised access.
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 6
![Page 10: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/10.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
2-1-4a. DBMS Users I
Actors - Design, use and administer a DBMS
Database Administrators (DBAs)
DB Designers
End Users
Casual, occasional
Naive, canned transactions
Sophisticated
Stand-alone
System analysts and application programmers
Workers - Design, develop and operate DBMS software
DBMS designers and implementers
Tool developers
Operators and maintenance personnel
2-1-4b. DBMS Users II
Application programmers writing COBOL, PASCAL, C, PL/1, C++ programs with embedded DBMS commands, running online or batch and programs are precompiled usually which allows dynamic querying of DBMS at runtime.
End users using an interactive query language like SQL, possibly working in a bulletproof, controlled GUI environment (INFORMIX) or using a command line interface (ORACLE) ... same commands as APs.
Database Administrator (DBA) ... system manager for database application.
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 7
![Page 11: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/11.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
2-2. What Data for a DBMS?
" DBMS used by any reasonably self-contained organisation, commercial, scientific or technical, from a single individual to a large corporation, who want to manage a large volume of information ".
Dublin City University
Students
Lecturers
Courses
Books
Schools
Faculties
Lectures
All these are entities or distinguishable objects with properties in the real world.
We also have relationships between real world entities:
Schools make-up faculties
Schools have students
Schools have lecturers
Students attend lectures given-by lecturers
Lectures are-part-of courses
Students borrow books
Lecturers borrow books
Lecturers recommend books
Courses can-be-composed-of-other courses
Features of real world relationships are ...
Bi-directional relationships
Most are binary, some are ternary (beware of the connection trap here ... 3 x binary relationships does not equal one ternary)
Entity types may be linked in more than one way
Relationships are part of the data set
Relationship set is not exhaustive
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 8
![Page 12: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/12.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
2-3a. Models of Data
In order to turn a nebulous picture of data into something structured for a DBMS we use a data model to organise information.
A data model is many things including:
A set of guidelines for representing the logical organisation of data
A pattern according to which data and relationships can be organised
An underlying mathematical formalism for building logical data organisations
Data models define logical units or entity types and relationships between those units.
In modeling the real world a real world relationship is defined as a named, ordered list of entity types and relationships can be classified by how many entities from one type are associated with how many entities from another entity type.
1:1 is the simplest but rarest e.g. person has_spouse person
N:1 or 1:N is a many-to-one or functional e.g. student owns book
N:M is a many-to-many e.g. students are-lectured-by lecturers
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 9
![Page 13: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/13.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
2-3b. Data Model Differences
Data models differ in how they handle association relationships (between entities) but all handle attribute relationships (relationships describing a single entity) in the same way.
The main data models are:
Relational Model
Hierarchical Model
Network Model
Object Oriented Model
Extended Relational Model
Hierarchic and network are much older than relational ... early 1960's vs. 1972
Market trend towards relational and beyond ... few network / hierarchical left
Hierarchical and network defined by abstraction from implementations whereas relational was defined a priori and thus has a sound mathematical basis
Non-relational are record-at-a-time whereas relational is more abstract
Non-relational are programming systems with navigation and optimisation by end users ... relational systems do their own optimisation
Almost all non-relational systems have been extended to have relational front ends
2-3c. DBMS Examples
Relational DBMS products:
ORACLE, DB2, SQL/DS, INGRES, INFORMIX, Rdb/VMS, SYBASE
Hierarchical:
IMS
Network:
IDMS
Object-oriented:
ONTOS, GemStone, ObjectStore, O2
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 10
![Page 14: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/14.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
2-4a. Why use a DBMS ?
There are several reasons for using a DBMS that follow on from each other.
Different models of the same data different organisations
Relational model is popular because it is abstract and computing evolution has always been towards the more abstract.
2-4b. Specific Reasons for DBMS
Logical organisation gives a clear picture and helps programmers achieve faster development of application programs.
Handles low-level file maintenance.
Yields centralisation of information. This, in turn is a good thing as:
Redundancy is eliminated
Inconsistency is avoided
Data is shared
Standards are enforced
Security is applied
Integrity is maintained
Requirements are balanced
Yields data independence where data organisation is not built into application programs, for example
Representation of numeric data
Units for numeric data
Data coding
Stored record and stored file structure
DBA can change access structures during the mid-life of the DBMS without affecting DBMS users, except with respect to performance.
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 11
![Page 15: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/15.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
2-4c. Why not a DBMS?
High initial cost in hardware perhaps?
Expensive piece of software
Expensive in terms of personnel and training of users
Overhead of providing
Don't have a large volume of data
Concurrent users?
2-5. Three Level Architecture
Functional organisation.
Does not cover many DBMS functions like concurrency, backup, security etc.
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 12
![Page 16: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/16.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
2-5-1. The External Level
Users use a language incorporating a data sublanguage for the database consisting of:
Data definition language (DDL)
Data manipulation language (DML)
Individual user's view is an external view ... multiple occurrences of multiple types of external records
Views are defined by an external schema which is defined in DDL
2-5-2. The Conceptual Level
A representation of the entire information content of the database abstracted from physical store.
May be different or similar to external views.
Data as it is ... multiple occurrences of multiple types of conceptual records.
Conceptual schema is defined by conceptual DDL and includes security and integrity constraints not present in the external levels.
No more than a union of individual external schemas + security and integrity.
2-5-3. The Internal Level
Defines types of stored records, indices, how fields are represented, in what sequence, etc.
Defined using an internal DDL.
Programs accessing this layer are dangerous because they bypass security and integrity checks of the internal layer.
Mappings exist between the different levels of the 3LA and the DBA is responsible for correct mapping between the levels.
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 13
![Page 17: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/17.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
2-6a. The Database Administrator I
DBMS Components
Stored data manager
DDL compiler
Run-time database processor
Query compiler
Precompiler
DML Compiler
Recovery Manager?
Concurrency control manager?
An essential part of any DBMS is the role played by the DBA
Has overall control of the DBMS
Decides information content and logical and conceptual database design/schema
Decides on storage structures and access using DDL
Liaises with users and helps them design their external schemas using DDL
Defines security and integrity checks
Defines backup and recovery strategies
Monitors performance of the DBMS and responds to changing requirements by using load, dump and statistical analysis routines.
2-6b. The Database Administrator II
An important source of information for the DBA is the data dictionary or system catalog for the DBMS which is
System database
Contains data about data (meta-data)
Descriptions of other objects rather than "raw" data
Includes schemas and mappings
Data dictionary can be queried as if it was a database
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 14
![Page 18: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/18.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
Chapter 3. Storage Structures
Re-cap from previous courses, coverage of basic storage structures.
Internal level of 3LA.
3-1. Why Storage Structures ?
3-2a. Hardware features of disks
3-2b. Disk and file managers
3-2c. Clustering on disk surface
3-3a. Using index files I
3-3b. Using index files II
3-4a. Hashing I
3-4b. Hashing II
3-4c. Hashing III
Sources: Elmasri & Navathe chapters 3 & 4 or any database textbook.
3-1. Why Storage Structures?
Main memory has faster access than disk.
Disk technology has not changed much, though emergence of RAID may change this.
Databases store information on disk rather than main memory.
Task of DBMS is to minimise amount of information to retrieve from disk.
There are many storage structures, similar to existence of many sorting algorithms.
DBMS should support many storage structures and use the most appropriate.
This is at internal level of 3LA, users should not be aware of this.
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 15
![Page 19: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/19.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
3-2a. Hardware Features of Disks
Hardware of a disk drive ... do you know the following ...
Disk pack
Track
Platter
Surface
Block/Page
Interblock gap
Sectors
Buffers
Read/Write head
Seek time
Rotational delay/latency
Block transfer time
Bulk transfer rate
If not ... check it out!
Elmasri & Navathe pp 71 - 74
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 16
![Page 20: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/20.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
3-2b. Disk and File Managers
Page/block is unit of transfer between disk and memory.
Disk manager ... OS component managing free space on disk, performs garbage collection and de-fragmentation.
File manager associates file names with sets of blocks/pages ... may be part of OS, or of DBMS.
File manager of OS is not suited to DBMS application.
3-2c. Clustering on Disk Surfaces
Clustering: logically related records physically close together on disk surface.
DBA can vary clusterings in mid-life of database.
Knowledge of how data is to be used is essential to good physical database design.
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 17
![Page 21: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/21.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
3-3a. Using Index Files I
Regularly executed query:
" Find all student numbers with city = x "
DBMS organised to perform this well.
Two ways to execute query:
Binary search through index (age) file to find offset in data (student) file.
Sequential search through data (student) file.
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 18
![Page 22: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/22.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
3-3b. Using Index Files II
Create index on primary key or on other field(s) or on combination.
File can have any number of indexes.
Index on field combination not the same as two separate indexes.
Indexes (usually) speed up retrieval but slow down updates.
We count page I/O operations.
B-tree is usually best all-round index file but there are variations [E&N pp 116]
Multi-level indexes [E&N pp 113]
3-4a. Hashing I
Hashing: fast access based on given value.
Records physically placed at (disk) location, function of field value.
When storing a record, DBMS computes hash address & tells the file manager where to store the record.
When retrieving, DBMS performs some computation on query to find where the data is stored.
Hashing is only useful for searches that have one equality condition.
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 19
![Page 23: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/23.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
Hashing example:
Student number Mod 13 Location/"Bucket"
100 9
200 5
300 1
400 10
500 6
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 20
![Page 24: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/24.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
3-4b. Hashing II
Hash collisions:
Student number Mod 13 Location
400 10
700 11
1000 12
1200 4
1700 10
Range of values greater than number of locations collisions
Range of values approaches number of locations.
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 21
![Page 25: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/25.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
Hash collisions handled by pointer chain.
3-4c. Hashing III
Stored file can have any number of indexes but only one hash.
Works well for single equality predicate only.
Physical sequence on disk does not correspond to any logical organisation leading to high seek times and thrashing.
As file size increases, number of collisions increases.
Works in memory or on disk.
Extendible hashing, multiple hashing, dynamic hashing, linear hashing.
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 22
![Page 26: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/26.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
Chapter 4. Entity-Relationship Data Modeling
This chapter presents an overview of entity-relationship data modeling.
4-1. E-R Introduction
4-2a. E-R Definitions I
4-2b. E-R Definitions II
4-2c. E-R Definitions III
4-3a. E-R Notation I
4-3b. E-R Notation II
4-3c. Cardinality Ratios
4-3d. Recursive Relationships
4-3e. Properties of Relationships
4-3f. Ternary Relationships
4-3g. Additional Notation
4-4. E-R Principles
4-5a. E-R Example I
4-5b. E-R Example II
Sources: Elmasri & Navathe chapter 3 or any database textbook.
4-1. E-R Introduction
The E-R model is used to interpret, specify and document requirements for database processing systems, irrespective of the type of DBMS being used.
It is used to draw a formal picture but since its inception in 1976 it has gone through many variations, so there is no standard!
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 23
![Page 27: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/27.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
4-2a. E-R Definitions I
EntityAn instance of a physical object in the real world.
Entity ClassA group of objects of the same type.
Attributes (Properties)Entities have attributes or properties that describe their characteristics.
Composite AttributeAn attribute that is composed of several more basic attributes.
Simple AttributeAn attribute which is not divisible.
Single-Valued AttributeAn attribute that has a single value for a particular entity.
Multi-Valued AttributeAn attribute that has a set of values for the same entity.
Value SetEach simple attribute is associated with a value set (or domain) which specifies the set of values that may be assigned to that attribute for each individual entity.
4-2b. E-R Definitions II
Relationship ClassA relationship class (type) is a set of associations among entity types.
Relationship InstanceAn association of entities i.e. an instance of a relationship type.
Relationships may have properties (attributes).
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 24
![Page 28: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/28.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
4-2c. E-R Definitions III
Degree of a RelationshipThe degree of a relationship is the number of participating entities.
Recursive RelationshipA relationship between entities of the same class.
Cardinality Ratio of a RelationshipThis constraint specifies the number of relationship instances that an entity can participate in (e.g. 1:1, 1:N, N: M).
4-3a. E-R Notation I
Entity Types
Relationship Types
Attributes
Composite Attributes
Multi-valued Attributes
Key Attributes
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 25
![Page 29: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/29.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
4-3b. E-R Notation II
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 27
![Page 30: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/30.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
4-3c. Cardinality Ratios
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 28
![Page 31: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/31.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
4-3d. Recursive Relationships
4-3e. Properties of Relationships
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 29
![Page 32: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/32.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
4-3f. Ternary Relationships
4-3g. Additional Notation
Not part of core or lowest common denominator notation ...
Weak entities
ID-dependent entities
Sub- and super- types
Derived attribute
Total participation
......
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 30
![Page 33: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/33.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
4-4. E-R Principles
E-R Principles - why?
Clarify ... structure from an unstructured world.
Several tools automate transformation of E-R diagram to DBMS schema.
IEW, IEF, Accelerator, Design/1, ORACLE CASE*Designer etc.
Start with natural language description and look for nouns (entities) and verbs (relationships).
Art, not science.
4-5a. E-R Example I
Football Club
"A football club has a name and a ground and is made up of players. A player can play for only one club and a manager, represented by his name manages a club. A footballer has a registration number, name and age. A club manager also buys players. Each club plays against each other club in the league and matches have a date, venue and score."
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 31
![Page 34: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/34.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
4-5b. E-R Example II
University Database
"A lecturer, identified by his or her number, name and room number, is responsible for organising a number of course modules. Each module has a unique code and also a name and each module can involve a number of lecturers who deliver part of it. A module is composed of a series of lectures and because of economic constraints and common sense, sometimes lectures on a given topic can be part of more than one module. A lecture has a time, room and date and is delivered by a lecturer and a lecturer may deliver more than one lecture. Students, identified by number and name, can attend lectures and a student must be registered for a number of modules. We also store the date on which the student first registered for that module. Finally, a lecturer acts as a tutor for a number of students and each student has only one tutor."
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 32
![Page 35: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/35.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
Chapter 5. Relational Model of Data
Crucial part of DBMS is the way the real world is modelled ... the style or feel.
Relational model is (still) the most significant model & most DBMS implementations are relational.
This chapter presents the relational model.
5-1a. Basic Modelling
5-1b. Relational Model Overview
5-2. Relational Tables
5-3a. Relational Model Integrity Basics
5-3b. Relational Model Integrity
5-4a. Relational Algebra Operators
5-4a1. SELECTION operation
5-4a2. PROJECTION operation
5-4a3. PRODUCT operation
5-4a4. UNION operation
5-4a5. INTERSECTION operation
5-4a6. DIFFERENCE operation
5-4a7. JOIN operation
5-4a8. DIVIDE operation
5-4b. Relational Algebra
5-4c. Relational Expressions
5-5. Relational Calculus
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 33
![Page 36: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/36.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
5-1a. Basic Modelling
Any data model has:
Form of data representation ... tables or relations
Rules specifying allowable states of data ... integrity conditions
Operators to manipulate data
Comparison to other real-world models:
Integers
Molecular model of solids and liquids
EMR ... wave or particle ?
Like (all ?) models, relational model is a paper model.
Why do we model ?
To understand is the usual motivation but also to reduce or abstract or encapsulate the real world into something manageable. This allows us to form predictions.
"Manageable" can mean to make computable, or not.
Example models: weather, traffic flow, stock market etc. For each it is clear why we want to model & what we do with these models.
Models are not necessarily an exact mapping of the real world, especially if the world is complex.
For databases, the operations are high-level & clear so the relational model can map the real world exactly, at the level we want.
Moving the model to a computer system, not all R.DBMS fully implement the relational model.
For other models, they do not always exactly map the real world ... stock market !
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 34
![Page 37: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/37.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
5-1b. Relational Model Overview
Relational database perceived by users as a collection of tables & nothing else.
Three tables named S, P and SP. Corresponding to suppliers, parts and shipments of parts by suppliers.
Also written as:
S(S#, SNAME, STATUS, CITY)
P(P#, PNAME, COLOUR, WEIGHT, CITY)
SP(S#, P#, QTY)
This is a model of a very limited world.
The entire world can be described.
Other related parts of the real world could be included e.g.
MAKES(P#, M#, COST)
M(M#, MNAME, MADDR)
Entire information content of database is represented as data values with no links or pointers or offsets between tables.
All data values are atomic, exactly one value and never a set.
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 35
![Page 38: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/38.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
5-2. Relational Tables
Relation is mathematical term for table.
In the database from earlier, there are three such tables named S, P and SP.
A tuple is a row, an attribute is a column.
In table S there are three tuples and four attributes and we can refer to S, S#, SNAME etc.
Lets work with a more abstract definition of a table for a while.
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 36
![Page 39: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/39.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
5-2. Relational Tables
A domain is a pool of values of the same (data) type from which one or more attributes in one or more tables take their values.
Above we can see that attribute A1 draws values from domain D1, A2 from D2 etc.
If two attributes draw values from the same domain then comparisons between tuples can be made on these attributes.
A relation, R, on domain D1 to DN consists of
A set of attributes A1 to AN such that Ai corresponds to Di.
A set of N-tuples or entries in the relation.
N is the degree and the number of tuples is the cardinality.
A domain normally draws its values from a data type, analogous to a programming language data type.
In addition, a domain may or may not include the additional value, NULL, as decided at domain definition time.
If included in a domain, the value NULL does not correspond to 0, " " or infinity .... it corresponds to
Not known
Missing
Does not apply
Thus a domain of 16 bit integers has a set of 216 = 65536 + 1 unique values, meaning it requires more than 16 bits to store !
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 37
![Page 40: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/40.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
During a database lifetime cardinality changes while degree does not.
A domain may appear more than once in a given relation.
Domains may be simple or composite.
Simple : name, age etc.
Composite : date ... number, street, city, zip ... etc.
A relation is a set; no duplicate tuples implying:
Tuples are unordered.
Attributes are unordered and are referenced by name, not position.
There is always at least one way to uniquely address tuples i.e. the combination of all attributes.
Attribute names are unique only within a table and may be re-used in different tables.
Table names are unique.
A primary key is a column or combination of columns with no duplicates or combination of duplicates, never (i.e. not allowable given semantics of table).
This never is important and implies cannot determine PK (primary key) from data.
Besides the default PK (all attributes) there is normally a "smaller" PK.
All attribute values are atomic ... one value at row/column intersection, called normalised or first normal form.
Relational DBMS is a database where data is represented as a collection of time-varying normalised relations of assorted degrees and cardinalities.
Working through the example above ...
Tables
Attributes and tuples
Domains
Primary keys
Degrees and cardinalities.
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 38
![Page 41: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/41.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
5-3a. Relational Model Integrity Basics
Integrity is the property of a database state being consistent with some predefined set of rules.
Feeling of correctness with respect to:
Domain values (independent).
Dependencies across tables.
One: Suppose we replace 'S3' in the SP table for P2 shipments with the value 'S4'.
Our shared understanding and interpretation tells us this is incorrect, cannot be, has lost its integrity.
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 39
![Page 42: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/42.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
Two: Furthermore, suppose there is a real-life rule that no two suppliers can come from the same city ... current data state is a violation.
The relational model has built-in support for supporting the first kind of rule above, but not the second.
5-3b. Relational Model Integrity
Some definitions:
Candidate KeyAn attribute or combination of attributes which is a unique identifier within a table.
Primary KeyOne of the candidate keys.
Alternate KeyThe candidate key(s) (if any) not chosen as the primary key.
Foreign KeyA (combination of) attribute(s) in one relation whose value(s) are required to equal in the primary key of another relation.
A foreign key is not necessarily part of the primary key.
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 40
![Page 43: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/43.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
Entity IntegrityNo attribute forming part of the primary key of a base table is allowed to have NULL values.
Referential IntegrityIf a relation R2 includes a foreign key FK matching the primary key PK of some base relation R1, then every value of R2.FK must:
(a) be equal to a value of R1.PK
or
(b) be wholly NULL, i.e. each attribute in R2.FK must be null.
N.B. Cannot legally refer to R1.PK, R2.FK
A base relationship corresponds to a real world entity or relationship, not a view.
The two rules refer to database states, not to transactions.
Other semantic rules, like "no two suppliers from the same city", are not in the model.
Most R.DBMS products support stopping updates that would violate these two rules.
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 41
![Page 44: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/44.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
5-4a. Relational Algebra Operators
Relational model data manipulation consists of
An assignment operator to "remember" the evaluation of expressions
A set of operators called the relational algebra
A set of alternative operators called the relational or tuple calculus.
Relational algebra has eight operators:
5-4a1. SELECTION operation
5-4a2. PROJECTION operation
5-4a3. PRODUCT operation
5-4a4. UNION operation
5-4a5. INTERSECTION operation
5-4a6. DIFFERENCE operation
5-4a7. JOIN operation
5-4a8. DIVIDE operation
Examine each of them in turn, re-examining them subsequently if necessary, until you get a good grasp of each of them, before proceeding with the course.
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 42
![Page 45: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/45.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
5-4a1. The SELECT Operation
Used to select a subset of tuples from a single relation which satisfy a selection condition.
Diagrammatically:
Written as <selection condition> ( <relation> ) where <selection condition> is a boolean expression and <relation> is a single relation.
Example:
<S.S# = S1(S)
<selection condition> compares an attribute name with a constant or another attribute name (if drawn from same domain), using {=, <, <=, >, >=, ¬ =} as comparators, and using boolean connectives if necessary.
SELECT is unary, commutative and applies to each tuple independently.
A series of nested SELECTs is equivalent to a nested SELECT with co-joined selection conditions:
<SP.P# = P1 ( < SP.QTY >= 100 ( <SP.S# ¬ = P1 (SP))) = <(SP.P# = P1) AND (SP>QTY >= 100) AND (SP.S# ¬ = P1) (SP)
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 43
![Page 46: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/46.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
5-4a2. The PROJECT Operation
Used to select a subset of columns from a single relation.
Diagrammatically:
Written as <attribute list> ( <relation name> ) where <attribute list> is a list of attributes in the specified relation and <relation name> is a single relation or an algebraic expression evaluating to a single relation.
Example:
<SNAME, STATUS(S)
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 44
![Page 47: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/47.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
If <attribute list> does not include the primary key then duplicates are possible and are removed.
Thus:
<CITY (S) evaluates to ...
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 45
![Page 48: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/48.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
5-4a3. The PRODUCT Operation
One of the standard set theoretic binary operations, the CARTESIAN PRODUCT or CROSS PRODUCT combines tuples from one relation with tuples from another relation, in all possible combinations of ways.
Written as R S
Thus if R has degree n and cardinality m and S has degree k and cardinality l, then R S has degree (n + k) and cardinality (m * 1).
Diagrammatically:
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 46
![Page 49: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/49.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
5-4a4. The UNION Operation
One of the standard set theoretic binary operations operating on union compatible relations (same degrees and domain matching).
Denoted R1 R2, the result of this is a relation that includes all tuples in either of R1 or R2 or both.
Duplicate tuples are eliminated.
Union is commutative and associative.
Diagrammatically:
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 47
![Page 50: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/50.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
5-4a5. The INTERSECTION Operation
One of the standard set theoretic binary operations operating on union compatible relations (same degrees and domain matching).
Denoted R1 R2, the result of this is a relation that includes all tuples in both R1 and R2.
Duplicate tuples are eliminated.
Intersection is commutative and associative.
Diagrammatically:
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 48
![Page 51: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/51.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
5-4a6. The DIFFERENCE Operation
One of the standard set theoretic binary operations operating on union compatible relations (same degrees and domain matching).
Denoted R1 - R2, the result of this is a relation that includes all tuples in R1 but not in R2.
Duplicate tuples are not an issue.
Difference is not commutative:
(R - S) ¬ = (S - R)
Diagrammatically:
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 49
![Page 52: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/52.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
5-4a7. The JOIN Operation
The JOIN operation is a binary operation which is used to combine tuples from two relations where the tuples are related by virtue of conforming to some join expression.
Diagrammatically:
The JOIN operation is written as:
R (join condition) S
If R(A1, A2, ... An) and S(B1, B2, ... Bm) then R (join condition) S is a relation with n + m attributes, namely (A1, ... An, B1, ... Bm) in that order.
Tuples in the resulting relation are those which are combinations of a tuple in R and a tuple in S which satisfy the join condition.
JOIN vs. CARTESIAN PRODUCT ?
The <join condition> compares an attribute from one relation with another attribute from the other relation provided they are drawn from the same domain, using {=, <, <=, >, >=, ¬ =} as comparators, and possibly augmented using boolean connectives if necessary to link more than one such expression.
The most common JOIN uses equality comparators only and is called an equijoin where the result will contain two identical columns. If we remove one of these identical columns we are left with a natural join.
LHS (LHS.Attrib2 = MID.Attrib1) MID
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 50
![Page 53: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/53.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
5-4a8. The DIVIDE Operation
The DIVIDE or DIVISION operator is another binary operation which can be applied to two relations R and S in the operation R ÷ S but only where the set of attributes in S is a subset of the set of attributes in R.
Formally, R(Z) ÷ S(X) where X Z, yields a relation T(Y) where Y (the set of attributes in the resulting T) is Z - X.
For a tuple to appear in the result of a divison operation, the values in that tuple must appear in R in combination with every tuple in S.
Thus the divisor (S) should be small, both in degree and cardinality, to avoid the empty resulting relation.
Diagrammatically:
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 51
![Page 54: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/54.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
5-4b. Relational Algebra
Of the eight relational operators, the group comprising
{SELECT, PROJECT, PRODUCT, UNION, DIFFERENCE}
or
{ , , , , -}
are primitive operators in that the other three can be defined in terms of these five.
Thus:
R S = (R S) - ((R - S) (S - R))
R <condition> S = <condition> (R S)
R ÷ S = Y (R) - Y ((S ,Y (R)) - R)
Why the relational algebra ?
Relational expressions can be constructed as a high level symbolic representation which can be subjected to transformation rules, hence optimisation.
Analogy to integer arithmetic here ...
relations integers
algebra +, -, *, ÷
+, - are primitive operators.
Transformation rule:
"... if an expression is the repeated addition of the same number, x, n times, then this is equivalent to multiplying x by (n + 1)."
5-4c. Relational Expressions
Some example relational expressions.
Not as complex as Elmasri & Navathe pp170 !
Ignore syntax and assignment operator.
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 52
![Page 55: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/55.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
One: Retrieve names and status of suppliers who supply 300 cases of any part
SNAME, STATUS (S S.S# = SP.S# ( QTY = 300 (SP))
And the answer is ...
Two: Retrieve the colour of parts supplied by either S1 or S2
COLOUR (P P.P# = SP.P# ( (SP.S# = S1 OR SP.S# = S2) (SP))
... and the answer is ...
Three: Retrieve the name and city of suppliers who supply any kind of part which is either green or made in Paris
SNAME, S.CITY (S S.S# = SP.S# (( (COLOUR = GREEN OR CITY = PARIS) (P)) P.P# = SP.P# SP))
The answer is ...
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 53
![Page 56: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/56.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
5-5. Relational Calculus
Calculus is alternative to algebra, specified at the same time but for historical reasons not as important as the algebra.
Calculus vs. algebra ?
Calculus is declarative, one expression specifying retrieval whereas in the algebra we write a formula which is a nested sequence of operations implying an ordering of those operations implying a procedure for evaluating it.
Not so with the calculus where we specify what to retrieve, not how to retrieve it !
Relational calculus and algebra are identical.
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 54
![Page 57: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/57.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
Chapter 6. SQL
This chapter covers SQL.
6-1. SQL Background & Standards
6-2. SQL2 Schemas
6-3. SQL DDL
6-4. SQL SELECT Statement
6-5. SQL INSERT, DELETE and UPDATE
6-6. Non-standard SQL
Sources: Elmasri & Navathe Chapter 7 or any database book.
6-1. SQL Background & Standards
SQL pronounced SEQUEL but named SQL for legal reasons.
Not case-sensitive and may be formatted any way ... convention is to put keywords in uppercase and new clauses on new lines.
SQL defined in 1974 by the IBM group developing SYSTEM R.
Most R.DBMS implementations have an SQL-like interface.
There are several standards for SQL ... coming together at last !
Goal is to have database vendors conform to interface standards allowing DB applications to operate with multiple products increased competition.
In general in computing, several governments insist on conformance to standards.
The SQL Standards ...
Standardisation effort started in mid-1980s.
SQL-86 is the bare bones standard, defined as the union of common features of most important DBMS.
SQL-89, a superset of SQL-86, added features like default values, check constraints and simple referential integrity.
NIST publishes guidelines such as FIPS 127 test suite for SQL-86 and FIPS 127-1 for SQL89.
Some 200 test cases and passing these puts a product on the validated products list.
SQL-92 aka SQL2 has FIPS 127-2, and is another superset and significantly larger (c. 575 vs. c. 120 pages).
SQL-86 and SQL-89 were just catching up and unifying what was in place ...
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 55
![Page 58: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/58.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
SQL2 has features found in existing products (at that time) but also features not in any products, so it is ahead of its time.
SQL2 has SQL89 plus ...
Additional data types like var length chars, bit strings, date and time intervals, etc.
Outer joins
Catalog specifications
Domains
Assertions
Temporary tables
Referential actions
Schema management language
Dynamic SQL
Scrolled cursors
Connections
Information schema tables
SQL3 specification was scheduled for c. 1996 with major extensions on SQL2 in several dimensions like type systems, stored procedures and OO ... but it is delayed.
For non-relational DBMS ... hierarchical and network models have no standard because there are so few systems.
OODBMS situation reminiscent of R.DBMS but OO developers have proposals for OSQL, an attempted migration path from R.DBMS to OODBMS.
SQL standardisation is great ... but too late.
Because conformance to SQL standardisation became in vogue only recently, most vendors have developed extra features, and all are different.
Vendors claim "We support SQL2 ... plus we have all these extra features ...".
Users buy-in and find they soon depend on the non-standard features, so they are hardwired to a particular product ... trapped !
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 56
![Page 59: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/59.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
For the purpose of presenting SQL we use SQL2, but not all of SQL2 as it is so enormous and most will be unused anyway.
People will slowly evolve towards SQL2, gradually embracing its features.
At the end we look at some ORACLE-specific enhancements, to give a flavour.
When you start to use SQL, the SQL2 essentials here will be enhanced by 'local' SQL features available from your DBMS.
6-2. SQL2 Schemas
DBMS products normally partition non-overlapping applications in some ad hoc way ... ORACLE uses tablespaces.
This notation is formalised in SQL2 as SCHEMAS.
Gather together tables, views, domains, grants, assertions, indexes and other constructs that belong to the same database application.
Each schema is given a schema name.
CREATE SCHEMA SCHEMANAMEAUTHORISATION USERNAME
We will ignore schema issues.
6-3. SQL DDL
SQL DDL (Data Definition Language) is:
CREATE TABLE
ALTER TABLE
DROP TABLE
CREATE INDEX
DROP INDEX
The syntax is ... where [] are options and {} are repetitions.
CREATE TABLE tablename (colname coltype [ attrib_constraint ]{, colname coltype [ attrib_constraint ] }[ table_constraint {, tableconstraint } ] )
Essentially CREATE TABLE X followed by a series of at least one colname / coltype clauses and then any number of table constraints.
CREATE TABLE tablename (colname coltype [ attrib_constraint ]
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 57
![Page 60: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/60.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
{, colname coltype [ attrib_constraint ] }[ table_constraint {, tableconstraint } ] )
Data types available include:
Integer numeric (INTEGER, SMALLINT)
Real numbers (FLOAT, REAL, DOUBLE PRECISION)
Formatted numbers
Character strings, fixed or varying length
Bit-string (fixed or varying)
Date
Time
CREATE TABLE tablename (colname coltype [ attrib_constraint ]{, colname coltype [ attrib_constraint ] }[ table_constraint {, tableconstraint } ] )
CREATE TABLE S (S# char(5),sname char(20));
CREATE TABLE tablename (colname coltype [ attrib_constraint ]{, colname coltype [ attrib_constraint ] }[ table_constraint {, tableconstraint } ] )
CREATE TABLE S (S# char(5),sname char(20));
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 58
![Page 61: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/61.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
CREATE TABLE tablename (colname coltype [ attrib_constraint ]{, colname coltype [ attrib_constraint ] }[ table_constraint {, tableconstraint } ] )
CREATE TABLE S (S# char(5) NOT NULL,sname char(20),dno integer,PRIMARY KEY (S#),FOREIGN KEY (dno) REFERENCES departments(dno));
CREATE TABLE tablename (colname coltype [ attrib_constraint ]{, colname coltype [ attrib_constraint ] }[ table_constraint {, tableconstraint } ] )
CREATE TABLE S (S# char(5) NOT NULL,sname char(20),dno integer,PRIMARY KEY (S#),FOREIGN KEY (dno) REFERENCES departments(dno));
CREATE TABLE tablename (colname coltype [ attrib_constraint ]{, colname coltype [ attrib_constraint ] }[ table_constraint {, tableconstraint } ] )
CREATE TABLE S (S# char(5) NOT NULL,sname char(20),dno integer,PRIMARY KEY (S#),FOREIGN KEY (dno) REFERENCES departments(dno));
CREATE TABLE tablename (colname coltype [ attrib_constraint ]
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 59
![Page 62: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/62.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
{, colname coltype [ attrib_constraint ] }[ table_constraint {, tableconstraint } ] )
CREATE TABLE S (S# char(5) NOT NULL DEFAULT 0,sname char(20) NOT NULL,dno integer,PRIMARY KEY (S#),CONSTRAINT deptcons, FOREIGN KEY (dno) REFERENCES departments(dno),ON DELETE set null,ON UPDATE cascade);
Referential triggered actions ...
CREATE TABLE tablename (colname coltype [ attrib_constraint ]{, colname coltype [ attrib_constraint ] }[ table_constraint {, tableconstraint } ] )
CREATE TABLE S (S# char(5) NOT NULL DEFAULT 0,sname char(20) NOT NULL,dno integer,PRIMARY KEY (S#),CONSTRAINT deptcons, FOREIGN KEY (dno) REFERENCES departments(dno),ON DELETE set null,ON UPDATE cascade);
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 60
![Page 63: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/63.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
CREATE TABLE tablename (colname coltype [ attrib_constraint ]{, colname coltype [ attrib_constraint ] }[ table_constraint {, tableconstraint } ] )
Check constraints ...
CHECK (conditional expression)
CREATE TABLE S (S# char(5) NOT NULL DEFAULT 0,sname char(20) NOT NULL,dno integer,PRIMARY KEY (S#),CONSTRAINT deptcons,FOREIGN KEY (dno) REFERENCES departments(dno),ON DELETE set null,ON UPDATE cascade,CONSTRAINT svalue,CHECK (S# > 0 and S# < 100));
CREATE TABLE tablename (colname coltype [ attrib_constraint ]{, colname coltype [ attrib_constraint ] }[ table_constraint {, tableconstraint } ] )
Check constraints ...
CHECK (conditional expression)
CREATE TABLE S (S# char(5) NOT NULL DEFAULT 0,sname char(20) NOT NULL,dno integer,PRIMARY KEY (S#),CONSTRAINT deptcons,FOREIGN KEY (dno) REFERENCES departments(dno),ON DELETE set null,ON UPDATE cascade,CONSTRAINT svalue,CHECK (S# > 0 and S# < 100));
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 61
![Page 64: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/64.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
CREATE DOMAIN domain-name data-type[ DEFAULT definition][ domain-constraint-definition-list];
Data-type is one of the built-in scalar data types.
DEFAULT definition is a default value.
Domain constraints to apply to every column using the domain.
CREATE DOMAIN dno-type AS INTEGERDEFAULT 99CONSTRAINT dno-defn-constraintCHECK (VALUE IN(90, 92, 93, 95, 97, 99) )NOT NULL;
CREATE DOMAIN dname-type AS CHAR(20);CREATE DOMAIN dsales-type AS NUMERIC(10, 2);
CREATE TABLE departments (dno dno-type,dname dname-type,dsales dsales-type);
CREATE DOMAIN domain-name data-type[ DEFAULT definition][ domain-constraint-definition-list];
CREATE DOMAIN dno-type AS INTEGERDEFAULT 99CONSTRAINT dno-defn-constraintCHECK (VALUE IN(90, 92, 93, 95, 97, 99) )NOT NULL;
CREATE DOMAIN dname-type AS CHAR(20);CREATE DOMAIN dsales-type AS NUMERIC(10, 2);
CREATE TABLE departments (dno dno-type,dname dname-type,dsales dsales-type);
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 62
![Page 65: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/65.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
SQL2 DOMAINS ...
Syntactic shorthand
No requirement that they be used ... can use system-defined data types
No support for domains on domains
No strong typing or type checking, requirement is only for underlying data types to be the same
No user-defined operations on domains
No subtypes, supertypes or inheritance
No domain of truth values
ALTER TABLE tablename ADD colname coltype;
Add one new (rightmost) column to a table
Define a new default value for an existing column
Delete an existing column's default value
Drop an existing column
Specify a new base table integrity constraint
Delete an existing base table integrity constraint
DROP TABLE tablename;
DROP TABLE s;DROP TABLE departments;
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 63
![Page 66: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/66.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
6-4. SQL SELECT Statement
SQL DML (Data Manipulation Language) has four commands:
SELECT
INSERT
UPDATE
DELETE
Basic format of SELECT statement:
SELECT attributesFROM table(s)WHERE conditionGROUP BY attribute(s)HAVING conditionORDER BY attribute(s);
Format of presentation is to look at clauses / features individually, using the sample suppliers, parts, shipments database as a worked example.
Not all of our SQL SELECTs have meaningful answers in this database.
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 64
![Page 67: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/67.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
Basic SQL SELECT.
Get colour and city for non-Paris parts with weight greater than 10
SELECT P.COLOUR, P.CITYFROM PWHERE P.CITY <> 'PARIS'AND P.WEIGHT > 10;
Answer:
Red, LondonBlue, RomeRed, London
SELECT removing duplicates.
Get unique colour and city for non-Paris parts with weight greater than 10
SELECT UNIQUE P.COLOUR, P.CITYFROM PWHERE P.CITY <> 'PARIS'AND P.WEIGHT > 10;
Answer:
Red, LondonBlue, Rome
Sorting the output.
Get unique colour and city for non-Paris parts with weight greater than 10, order by colour
SELECT UNIQUE COLOUR, CITYFROM PWHERE P.CITY <> 'PARIS'AND P.WEIGHT > 10ORDER BY COLOUR;
Answer:
Blue, RomeRed, London
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 65
![Page 68: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/68.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
SELECTs containing JOINs.
Get the name of suppliers and the name of the parts they supply
SELECT SNAME, PNAMEFROM S, SP, PWHERE S.S# = SP.S#AND SP.P# = P.P#;
Answer:
Smith, NutSmith, BoltSmith, ScrewJones, NutJones, BoltBlake, Bolt
Using aliases to resolve ambiguities.
SELECT SNAME, PNAMEFROM S, SP, PWHERE S.S# = SP.S#AND SP.P# = P.P#;
SELECT SUPPLIER.SNAME, PART.PNAMEFROM S SUPPLIER, SP SHIPMENT, P PARTWHERE SUPPLIER.S# = SHIPMENT.S#AND SHIPMENT.P# = PART.P#;
Specifying JOIN in the FROM clause, to make it easier to comprehend.
Get pairs of city names such that a supplier in the first city supplies a part stored in the second city
SELECT DISTINCT S.CITY, P.CITYFROM S JOIN SP USING S# JOIN P USING P#;
Answer:
Paris, LondonParis, ParisParis, RomeRome, Paris
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 66
![Page 69: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/69.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
Use of '*'.
SELECT *FROM SWHERE STATUS <> 10;
Answer:
S1, Smith, 20, ParisS3, Blake, 30, Rome
Set Operators ... UNION, etc.
Get supplier and part numbers where shipments are greater than 300 or where the supplier who supplies a part has a status not equal to 10
(SELECT SP.S#, SP.P#FROM SPWHERE QTY > 300)UNION(SELECT SP.S#, SP.P#FROM SP, PWHERE S.S# = SP.S#AND S.STATUS <> 10);
Answer:
S1, P3S2, P2 S1, P1S1, P2 S1, P3S3, P2
Nested queries ... complete SELECTs within WHERE clauses of another query.
Get supplier names for suppliers who supply part P2
SELECT DISTINCT S.SNAMEFROM SWHERE S.S# IN(SELECT SP.S#FROM SPWHERE SP.P# = 'P2' );
Answer:
SmithJones Blake
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 67
![Page 70: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/70.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
Nested queries ... alternative way to formulate a query.
Get supplier names for suppliers who supply part P2
SELECT DISTINCT S.SNAMEFROM SWHERE S.S# IN(SELECT SP.S#FROM SPWHERE SP.P# = 'P2' );
SELECT DISTINCT S.SNAMEFROM S, SPWHERE SP.P# = 'P2'AND S.S# = SP.S#;
Nested queries ... relational algebra.
Get supplier names for suppliers who supply part P2
SELECT DISTINCT S.SNAMEFROM SWHERE S.S# IN(SELECT SP.S#FROM SPWHERE SP.P# = 'P2' );
SELECT DISTINCT S.SNAMEFROM S, SPWHERE SP.P# = 'P2'AND S.S# = SP.S#;
((SP JOIN S) where P# = 'P2') [SNAME]
Nested query ... two levels.
Get supplier names for suppliers who supply at least one red part
SELECT DISTINCT S.SNAMEFROM SWHERE S.S# IN
(SELECT SP.S#FROM SPWHERE SP.P# IN
(SELECT P.P#FROM PWHERE P.COLOUR = 'Red'));
Answer:
SmithJones
Nested query ... re-phrased as JOINs and in relational algebra.
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 68
![Page 71: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/71.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
Get supplier names for suppliers who supply at least one red part
SELECT DISTINCT S.SNAMEFROM SWHERE S.S# IN
(SELECT SP.S#FROM SPWHERE SP.P# IN
(SELECT P.P#FROM PWHERE P.COLOUR = 'Red'));
(((P where colour = 'Red') JOIN SP) [S#] JOIN S) [SNAME]
Nested query ... explicit sets.
SELECT UNIQUE S.SNAMEFROM SWHERE S# IN (S1, S2);
Answer:
SmithJones
Nested queries ... using EXISTS
EXISTS ( SELECT ... FROM ... )
evaluates TRUE iff the embedded SELECT is not empty.
Get supplier names for suppliers who supply part P2
SELECT UNIQUE S.SNAMEFROM SWHERE EXISTS (SELECT *FROM SPWHERE SP.S# = S.S#AND SP.P# = 'P2' );
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 69
![Page 72: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/72.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
Nested queries ... using NOT EXISTS
NOT EXISTS ( SELECT ... FROM ... )
evaluates TRUE iff the embedded SELECT is empty.
Get supplier names for suppliers who do not supply part P2
SELECT UNIQUE S.SNAMEFROM SWHERE NOT EXISTS (SELECT *FROM SPWHERE SP.S# = S.S#AND SP.P# = 'P2' );
Ditto for NOT IN.
Nested queries ...
In addition to IN, SQL also has operators to compare a single value to a set of values.
= ANY= SOME
These return TRUE if the single value equals some or any value in the set of values.
Can also use >, >=, <= and <> with ANY or SOME.
The keyword ALL can also be used in nesting queries and its meaning is that it returns TRUE only if ALL values in the comparison operation are true.
Aggregate functions ... within SQL there are built-in functions COUNT, SUM, MAX, MIN and AVG.
May be used in SELECT or HAVING clauses.
We may include DISTINCT or UNIQUE to remove duplicates before applying the operation (excluding MAX and MIN).
COUNT(*) counts all rows without eliminating duplicates.
NULL values are discarded before applying the operators, except for COUNT(*).
If the argument is an empty set, COUNT returns a value of 0, the others return NULL.
Get Max and Min quantity of shipments for part P2
SELECT MAX(SP.QTY), MIN(SP.QTY)FROM SPWHERE SP.P2 = 'P2';
Answer:
400, 200
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 70
![Page 73: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/73.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
Sometimes we want to apply aggregate functions to subgroups of tuples ... e.g. avg salary of employees in each department, or number of shipments of each part.
We can group tuples based on having the same value for some attributes (or combination of ) and apply functions to each of the groups.
Done via GROUP BY clause which specifies attributes which should also be in the SELECT clause.
For each part supplied, get the part number and total shipment quantity
SELECT P#, SUM(QTY)FROM SPGROUP BY P#;
Answer:
P1, 600P2, 800P3, 400
For each part supplied, get the part number and total shipment quantity
What is done here is that the table (after WHERE clause is evaluated) is re-arranged into GROUPS which share the same value of P#.
So
S1 P1 300S1 P2 200S1 P3 400S2 P1 300S2 P2 400S3 P2 200
is turned into:
S1 P1 300 [P1 values]S2 P1 300
S1 P2 200 [P2 values]S2 P2 400S3 P2 200
S1 P3 400 [P3 values]
Thus:
SELECT P#, SUM(QTY)FROM SPGROUP BY P#;
Answer:
P1, 600P2, 800P3, 400
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 71
![Page 74: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/74.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
To apply restrictions so that some groups are eliminated we use the HAVING clause to eliminate groups (not original tuples).
For each part supplied, get the part number and total shipment quantity provided there is more than one shipment of that part
SELECT P#, SUM(QTY)FROM SPGROUP BY P#HAVING COUNT(*) > 1;
S1 P1 300 [P1 values]S2 P1 300
S1 P2 200 [P2 values]S2 P2 400S3 P2 200
S1 P3 400 [P3 values]
Answer:
P1, 600P2, 800
We can have conditionals (WHERE) applied to eliminate tuples before the GROUP BY and HAVING clauses.
For each part supplied which is not green, get the part number and total shipment quantity provided there is more than one shipment of that part
SELECT SP.P#, SUM(QTY)FROM SP, PWHERE P.P# = SP.P#AND P.COLOUR <> 'Green'GROUP BY SP.P#HAVING COUNT(*) > 1;
S1 P1 300 [P1 values]S2 P1 300
S1 P3 400 [P3 values]
Answer:
P1, 600
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 72
![Page 75: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/75.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
String and substring comparisons in SQL use the LIKE operator where % replaces an arbitrary number of characters and _ replaces a single arbitrary character.
SELECT SNAMEFROM SWHERE SNAME LIKE '%e%';
Answer:
JonesBlake
SQL can include simple arithmetic operators in the SELECT clause of SELECT statements.
SELECT S#, QTY*0.90FROM SPWHERE P# = 'P1';
Answer:
S1, 270S2, 270
SELECT statements ... general.
SQL SELECT can have ...
SELECT attributesFROM table(s)WHERE conditionGROUP BY attribute(s)HAVING conditionORDER BY attribute(s);
but only SELECT and FROM are mandatory.
A query is evaluated by applying the FROM, then WHERE, then GROUP BY, then HAVING , then ORDER BY.
SQL is extremely redundant in that for most queries, even simple ones, there is usually more than one way to formulate them, but they are all correct and efficiency is not a concern as the DBMS does the query optimisation.
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 73
![Page 76: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/76.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
6-5. SQL INSERT, DELETE and UPDATE
INSERT
Insert a single tuple into a single relation.
INSERT INTO SVALUES (S4, "Ryan", 10, "London");
(S4, Ryan, 10, London)
INSERT INTO S (S#, SNAME)VALUES (S4, "Ryan");
(S4, Ryan, NULL, NULL)
INCLUDEPICTURE "Images\\tables-1.gif" \* MERGEFORMAT
INSERT
Insert multiple tuples into a relation as the result of a query.
INSERT INTO S (S#, CITY)SELECT SP.S#, P.CITYFROM SP, PWHERE SP.P# = P.P#AND QTY = 300;
(S1, NULL, NULL, London)(S2, NULL, NULL, London)
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 74
![Page 77: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/77.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
DELETE
Delete a single tuple from a single relation:
DELETE FROM SWHERE S# = S1;
Delete a set of tuples from a single relation:
DELETE FROM SWHERE CITY = 'Paris';
Delete all tuples in a relation:
DELETE FROM S;
Delete as the result of a sub-query:
DELETE FROM SWHERE S# IN(SELECT S#FROM SPWHERE QTY > 200);
Note: Error here. Dialogue should state that all tuples exceptthose with QTY = 200 are deleted from S.
Delete nothing:
DELETE FROM SWHERE S# = S4;
UPDATE
Modify the attribute values of some tuple:
UPDATE SSET SNAME = "Murphy",STATUS = 15WHERE S# = S1;
Thus:
(S1, Smith, 20, Paris) (S1, Murphy, 15, Paris)
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 75
![Page 78: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/78.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
UPDATE
Modify the attribute values of some tuples:
UPDATE SSET SNAME = "Murphy",STATUS = 15WHERE CITY = 'Paris';
Thus:
(S1, Smith, 20, Paris) (S1, Murphy, 15, Paris)(S2, Jones, 10, Paris) (S2, Murphy, 15, Paris)
UPDATE
Modify nothing !
UPDATE SSET SNAME = "Murphy",STATUS = 15WHERE CITY = 'Dublin';
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 76
![Page 79: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/79.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
6-6. Non-Standard SQL
Almost all relational DBMS with SQL interfaces provide "extensions" to the standard and don't (yet) implement the full standard SQL2.
As an example, ORACLE 7 has the following extra features:
Each table has pseudo-columns which can be queried but whose values cannot be changed and they include:
ROWID (uniquely identify a row)
ROWNUM (the position of a single row among others selected by a query)
Data types include DATE, LONG (char string up to 2 Gbytes), LONG RAW (binary string up to 2 Gbytes) and RAW (binary up to 2K)
CREATE CLUSTER creates a clustering of database tuples on disk
ALTER CLUSTER to refine storage allocations for a cluster by increasing its disk space, filenames etc.
ALTER TABLESPACE ... by adding or renaming a database file or refining storage limits
ANALYSE ... validates the structure of an index, table or cluster or collects performance statistics for them (percentage distributions etc.)
CREATE CLUSTER ... creates new cluster and specifies the columns which are to be its key, assign disk space, etc.
CREATE CLUSTERED INDEX
CREATE PROFILE ... for a user ... limit resources in terms of CPU usage, number of transactions, connect time, idle time, ...
CREATE SEQUENCE ... creates a new sequence suitable for generation of primary keys ... start with, increment by, max val, min val, ordering ... is this domain definition ?
CREATE TABLE with clusters
As part of table or column integrity constraints, can specify a tablename into which are put rows violating the constraint, and for each store:
(rowid, owner, tabname, constraint)
CREATE TRIGGER
EXPLAIN PLAN to describe each step of the execution plan for an SQL statement and place this description in a PLAN table whose attributes include statement ID, timestamp, operations, etc.
And many others ...
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 77
![Page 80: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/80.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
Chapter 8. The System Catalog
This chapter covers the system catalog.
8-1. The System Catalog
8-2. The Informix Catalog
8-3. The ORACLE7 Catalog
Sources: Elmasri & Navathe Chapter 15 has an ER design and a mock-up of a catalog for a relational and for a network DBMS catalog.
E&N also wrestles with the issue of whether a system catalog is a data dictionary … angels on the head of a pin !
8-1. The System Catalog
The system catalog is a part of a (relational) DBMS containing:
Table names
Attribute names and data types
Index names and existence
Table and user level authorisations
View definitions and dependencies
Primary and secondary and foreign keys
Synonym names
All kind of constraints, database and table levels
Users, authorisation, names, passwords
Anything about the database or describing the format of the database i.e. "meta data".
In R.DBMS implementations, all this information is implemented itself as a set of database tables which users can see and can query (if you know the format of the catalog).
The catalog tables contain entries for all users tables and also contain entries for themselves !
Different R.DBMS implementations have different implementation approaches for the catalog although SQL3 is attempting some standardisation of catalog formats (implemented ironically via views rather than catalog re-design).
As the catalog is a set of tables the user can see, these tables can be queried directly by endusers (SQL SELECT), but INSERT, DELETE and UPDATE commands are not allowed as they potentially corrupt the database ... but the most frequent accesses are
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 78
![Page 81: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/81.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
by the DBMS modules themselves ... the query optimiser needs to know the names etc. of tables and attributes, and also needs to know the sizes of tables and their range of values (specificity) of columns as it decides how to execute a query.
Thus the catalog needs to be designed in the most efficient way possible for accesses (by the DBMS modules) and for updates ... effectively creating a new user table in SQL causes a tuple entry into the system table describing tables and some other tuple modifications / entries also.
As SQL commands are executed, the catalog tables are updated automatically by the DBMS.
To illustrate the system catalog we use a worked example ... INFORMIX from a few versions ago when it (the DBMS) was simple ... we will work through all tables to illustrate the simplicity and beauty of it ... and to contrast w.r.t. complexity, we will look at ORACLE7 system catalog.
The Elmasri & Navathe book has a phantom catalog given as an ER diagram and as a table.
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 79
![Page 82: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/82.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
8-2. The Informix Catalog
INFORMIX had nine tables in its system catalog as follows:
TABLE NAME
DESCRIPTION
systables A description of all database tables - One tuple per database table
syscolumns
A description of all columns in all tables - One tuple per column per table
sysindexes
Description of all indexes on all tables - One tuple per index
sysabauth
Table level privileges for users
syscolauth
Column level privileges for users
sysdepend
How views depend on underlying base tables
syssynonym
List of synonym names for tables, if any created
sysusers Database level privileges for users
sysviews Definition of all views
Lets look at some tables in more detail ...
SYSUSERS
username char(8) user login id
usertype char(1) indicates DBA / resource / connect privileges
password char(8) encrypted password
Create unique index users on sysusers(username);
... Guarantees unique usernames
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 80
![Page 83: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/83.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
SYSTABLES
tabname char(18) table name
owner char(8) username of table creator
dirpath char(64) directory path for datafile
tabid integer internal number/code for table .. for efficiency
rowsize smallint number of bytes wide
ncols smallint number of columns
nindexes smallint number of indexes
nrows integer number of rows
created date date of creation
version integer table version number
tabtype char(1) table or view
audpath char(64) full pathname for audit file
Create unique index tabname on systables(tabname, owner);Create unique index tabid on systables(tabid);
... Guarantees unique table names per owner, and unique codes for tables
Notice how columns are made as "narrow" as possible to reduce page I/O.
And so on ...
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 81
![Page 84: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/84.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
8-3. The ORACLE7 Catalog
The ORACLE7 catalog is a monster composed of some base tables and a multitude of "data dictionary views". Here it is ... all 170 tables worth !
ALL_CATALOG ALL_COL_COMMENTS ALL-COL-PRIVS ALL_COL_PRIVS_MADEALL_COL_PRIVS_RECDALL_CONSTRAINTS ALL_CONS_COLUMNS ALL_DB_LINKS ALL_DEF_AUDIT_OPTS ALL_DEPENDENCIES ALL_ERRORS ALL_INDEXES ALL_IND_COLUMNS ALL_LABELS ALL_MOUNTED_DBS ALL_OBJECTS ALL_SEQUENCES ALL_SNAPSHOTS ALL_SOURCE ALL_SYNONYMS ALL_TABLES ALL_TAB_COLUMNS ALL_TAB_COMMENTS ALL_TAB_PRIVS ALL_TAB_PRIVS_MADE ALL_TAB_PRIVS_RECD ALL_TRIGGERS ALL_USERS ALL_VIEWS AUDIT_ACTIONS CAT CLU CODE_PIECES CODE_SIZE COLS COLUMN_PRIVILEGES DBA_2PC_NEIGHBORS DBA_2PC_PENDING DBA_AUDIT_EXISTS DBA_AUDIT_OBJECT DBA_AUDIT_SESSION DBA_AUDIT_STATEMENT DBA_AUDIT_TRAIL DBA_BLOCKERS DBA_CATALOG DBA_CLUSTERS DBA_CLU_COMMENTS DBA_COL_COMMENTS DBA_COL_PRIVSDBA_CONSTRAINTSDBA_CONS_COLUMNS DBA_DATA_FILES DBA_DB_LINKS DBA_DDL_LOCKS DBA_DEPENDENCIESDBA_DML_LOCKSDBA_ERRORSDBA_EXP_FILESDBA_EXP_OBJECTSDBA_EXP_VERSION
DBA_EXTENTSDBA_FREE_SPACEDBA_INDEXESDBA_IND_COLUMNSDBA_LOCKSDBA_OBJECTSDBA_OBJECT_SIZEDBA_OBJ_AUDIT_OPTSDBA_PRIV_AUDIT_OPTSDBA_PROFILESDBA_ROLE_PRIVSDBA_ROLESDBA_ROLLBACK_EGSDBA_SEGMENTSDBA_SEQUENCESDBA_SNAPSHOTSDBA_SNAPSHOT_LOGSDBA_SOURCEDBA_STMT_AUDIT_OPTSDBA_SYNONYMSDBA_SYS_PRIVSDBA_TABLESDBA_TABLESPACESDBA_TAB_COLUMNSDBA_TAB_COMMENTSDBA_TAB_PRIVSDBA_TRIGGERSDBA_TS_QUOTASDBA_USERSDBA_VIEWSDBA_WAITERSDBMS_ALERT_INFODBMS_LOCK_ALLOCATEDDEPTREEDICTDICTIONARYDICT_COLUMNSGLOBAL_NAMEIDEPTREEINDINDEX_HISTOGRAMINDEX_STATSLOADER_COL_INFOLOADER_CONSTRAINT_INFOLOADER_INDCOL_INFOLOADER_PARAM_INFOLOADER_TAB_INFOLOADER_TRIGGER_INFOLOADER_IND_INFOOBJPARSED_PIECESPARSED_SIZEPUBLIC_DEPENDENCYRESOURCE_COSTROLE_ROLE_PRIVSROLE_SYS_PRIVSROLE_TAB_PRIVSSEQSESSION_PRIVSSESSION_ROLES
SOURCE_SIZESTMT_AUDIT_OPTION_MAPSYNSYSTEM_PRIVILEGE_MAPTABLE_PRIVILEGESTABLE_PRIVILEGE_MAPTABSUSER_AUDIT_OBJECTUSER_AUDIT_SESSIONUSER_AUDIT_STATEMENTUSER_AUDIT_TRAILUSER_CATALOGUSER_CLUSTERSUSER_CLU_COLUMNSUSER_COL_COMMENTSUSER_COL_PRIVSUSER_COL_PRIVS_MADEUSER_COL_PRIVS_RECDUSER_CONSTRAINTSUSER_CONS_COLUMNSUSER_DB_LINKSUSER_DEPENDENCIESUSER_ERRORSUSER_EXTENTSUSER_FREE_SPACEUSER_INDEXESUSER_IND_COLUMNSUSER_OBJECTSUSER_OBJECT_SIZEUSER_OBJ_AUDIT_OPTSUSER_RESOURCE_LIMITSUSER_ROLE_PRIVSUSER_SEGMENTSUSER_SEQUENCESUSER_SNAPSHOTSUSER_SNAPSHOT_LOGSUSER_SOURCEUSER_SYNONYMSUSER_SYS_PRIVSUSER_TABLESUSER_TABLESPACESUSER_TAB_COLUMNSUSER_TAB_COMMENTSUSER_TAB_PRIVSUSER_TAB_PROVS_MADEUSER_TAB_PRIVS_RECDUSER_TRIGGERSUSER_TS_QUOTASUSER_USERSUSER_VIEWS
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 82
![Page 85: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/85.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
Chapter 9. Views
This chapter covers views.
9-1. View Definition
9-2. View Examples
Sources: Elmasri & Navathe Chapter 7 (part of SQL) pp 215 - 219
9-1. View Definition
A view is a named, derived table, like a "window".
Base tables are actually stored physically and exist as data on disk but views are virtual data ... they do not exist separately. Their information content is dynamically derived.
A DBMS schema is made up of base tables & views and SQL DML commands are executed on base tables & views.
Data appearing in a view does not exist separately but appears to.
The definition of views is in terms of base tables or in terms of other views and the view definition is stored in the system catalog (check out the system catalog entries for INFORMIX and for ORACLE that we saw earlier).
Views provide data "windows".
A single view may show aggregated (derived) data or actual data as a virtual table.
Views permit access to sensitive data by allowing users to see only aggregates or summaries (as views) and then apply security privileges to those views.
Any SQL SELECT can be executed on a view.
UPDATE, DELETE and INSERT commands can be executed on views though these operations can be limited.
Views are dynamic windows, not snapshots, so as data changes, so do views so they are always up to date.
Syntax is:
CREATE VIEW viewname[(colname [,colname]*)] AS subquery[WITH CHECK OPTION] ;
Note that if the colname attribute is not present we inherit attribute names.
The with check option (WCO) is needed if view is updatable and updates are rejected if they violate the view definition condition … interesting … they are allowed if WCO is not included.
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 83
![Page 86: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/86.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
Cannot create an index on a view, cannot use UNION or ORDER BY in the subquery (ORDER BY would not make sense, no UNION is a quirk).
DROP VIEW viewname ;
The above statement drops a view and any other views defined on this view ... cascades.
For executing queries, an R.DBMS will:
Try to combine the view definition and the user’s query into one query if possible for overall query optimisation but this is expensive if the view definition query is complex.
View materialisation is where the DBMS will create temporary tables reflecting view content and immediately usable by other instances of that view.
SELECTs on views are straightforward.
INSERTs put NULLs in base table columns not in the view definition and this is not allowed unless the base table columns allow NULLs.
Column subsets are theoretically updatable iff they contain the primary key.
Cannot update a database through a view if the view definition involves JOINs, GROUP BY, DISTINCT or aggregate functions.
Cannot alter a view ... drop it and create another.
If a column is dropped from a base table which is involved in a view definition then the view is invalidated ... older R.DBMS discovered this only on subsequent access (i.e. no integrity checking).
In summary, views are important:
in formulating difficult queries though this role is underestimated
in allowing partial queries to be re-used
in providing security by hiding data
Note: Many of the entries in the INFORMIX and in the ORACLE system catalogs are actually views on base tables within the system catalog
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 84
![Page 87: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/87.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
9-2. View Examples
CREATE VIEW BUSINESS-STUDIES-STUDENTSAS SELECT S#, SNAME, SCOURSEFROM SWHERE SCOURSE = 'BBS'WITH CHECK OPTION;
S# SNAME SCOURSEAGE
BUSINESS-STUDIES-STUDENTS
1234 Givins BBS 20 S# SNAME SCOURSE
2345 Irwin MCA 22 1234 Givins BBS
3456 Babb BBS 20 3456 Babb BBS
4567 Kenna BBS 21 4567 Kenna BBS
5678 Cascarino CA 20
6789 Keane CS 22
CREATE VIEW BUSINESS-STUDIES-STUDENTSAS SELECT S#, SNAME, SCOURSEFROM SWHERE SCOURSE = 'BBS'WITH CHECK OPTION;
S# SNAME SCOURSEAGE
BUSINESS-STUDIES-STUDENTS
1234 Givins BBS 20 S# SNAME SCOURSE
2345 Irwin MCA 22 1234 Givins BBS
3456 Babb BBS 20 3456 Babb BBS
4567 Kenna BBS 21 4567 Kenna BBS
5678 Cascarino CA 20
6789 Keane CS 22
INSERT INTO BUSINESS-STUDIES-STUDENTSVALUES (1111, McGrath, BBS);
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 85
![Page 88: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/88.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
CREATE VIEW BUSINESS-STUDIES-STUDENTSAS SELECT S#, SNAME, SCOURSEFROM SWHERE SCOURSE = 'BBS'WITH CHECK OPTION;
S# SNAME SCOURSE AGE BUSINESS-STUDIES-STUDENTS
1234 Givins BBS 20 S# SNAME SCOURSE
2345 Irwin MCA 22 1234 Givins BBS
3456 Babb BBS 20 3456 Babb BBS
4567 Kenna BBS 21 4567 Kenna BBS
5678 Cascarino CA 20 1111 McGrath BBS
6789 Keane CS 22
1111 McGrath BBS NULL
INSERT INTO BUSINESS-STUDIES-STUDENTSVALUES (1111, McGrath, BBS);
CREATE VIEW BUSINESS-STUDIES-STUDENTSAS SELECT S#, SNAME, SCOURSEFROM SWHERE SCOURSE = 'BBS'WITH CHECK OPTION;
S# SNAME SCOURSEAGE
BUSINESS-STUDIES-STUDENTS
1234 Givins BBS 20 S# SNAME SCOURSE
2345 Irwin MCA 22 1234 Givins BBS
3456 Babb BBS 20 3456 Babb BBS
4567 Kenna BBS 21 4567 Kenna BBS
5678 Cascarino CA 20
6789 Keane CS 22
INSERT INTO BUSINESS-STUDIES-STUDENTSVALUES (1234, McKenna, BBS);
... violates primary key in the view.
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 86
![Page 89: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/89.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
CREATE VIEW BUSINESS-STUDIES-STUDENTSAS SELECT S#, SNAME, SCOURSEFROM SWHERE SCOURSE = 'BBS'WITH CHECK OPTION;
S# SNAME SCOURSEAGE
BUSINESS-STUDIES-STUDENTS
1234 Givins BBS 20 S# SNAME SCOURSE
2345 Irwin MCA 22 1234 Givins BBS
3456 Babb BBS 20 3456 Babb BBS
4567 Kenna BBS 21 4567 Kenna BBS
5678 Cascarino CA 20
6789 Keane CS 22
INSERT INTO BUSINESS-STUDIES-STUDENTSVALUES (2345, McGrath, BBS);
... violates primary key of tuple not in view.
CREATE VIEW BUSINESS-STUDIES-STUDENTSAS SELECT S#, SNAME, SCOURSEFROM SWHERE SCOURSE = 'BBS'WITH CHECK OPTION;
S# SNAME SCOURSEAGE
BUSINESS-STUDIES-STUDENTS
1234 Givins BBS 20 S# SNAME SCOURSE
2345 Irwin MCA 22 1234 Givins BBS
3456 Babb BBS 20 3456 Babb BBS
4567 Kenna BBS 21 4567 Kenna BBS
5678 Cascarino CA 20
6789 Keane CS 22
INSERT INTO BUSINESS-STUDIES-STUDENTSVALUES (1111, McGrath, CA);
... violates the view definition constraint
... what happens if "WITH CHECK OPTION" is left out ?
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 87
![Page 90: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/90.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
CREATE VIEW SUPPLIER-PART-SMALL-SHIPMENTSAS SELECT SNAME, PNAMEFROM S, SP, PWHERE S.S# = SP.S#AND SP.P# = P.P#AND SP.QTY > 100WITH CHECK OPTION;
Note : Error in dialogue. Should say greater than, not less than.
SNAME PNAME
Smith Nut
Smith Bolt
Smith Screw
Jones Nut
Jones Bolt
Blake Bolt
Does the "WITH CHECK OPTION" make sense ?
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 88
![Page 91: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/91.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
CREATE VIEW PARTS-AMOUNTSAS SELECT SP.P#, P.PNAME, SP.QTYFROM P, SPWHERE P.P# = SP.P#;
Note : Error in dialogue. Should say P.P# = SP.P# as above.
P# PNAME QTY
P1 Nut 300
P2 Bolt 200
P3 Screw 400
P1 Nut 300
P2 Bolt 400
P2 Bolt 200
There are duplicates as far as the query in the view definition is concerned, so they are eliminated (greyed out in diagram).
Syntactically, this query is correct but looking at the supplier-parts table it appears it should contain the total amount of parts shipped by a supplier, not just one shipment, but the example on-screen does not do this.
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 89
![Page 92: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/92.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
CREATE VIEW SUMMARY (S#, TOTQRY)AS SELECT S#, SUM (QTY)FROM SPGROUP BY S#;
SP P# QTY SUMMARY
S1 P1 300 S# TOTQRY
S1 P2 200 S1 900
S1 P3 400 S2 700
S2 P1 300 S3 200
S2 P2 400
S3 P2 200
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 90
![Page 93: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/93.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
Chapter 10. Database Design & Normalisation
This chapter covers database design and normalisation.
10-1. Introduction to Database Design
10-2. 3NF, 2NF and 1NF
10-3. BCNF
10-3-1. BCNF Example 1
10-3-2. BCNF Example 2
10-3-3. BCNF Example 3
10-3-4. BCNF Example 4
10-4. 4NF
10-5. 5NF
10-6. Database Design
Sources: Elmasri & Navathe pp 391 - 445
10-1. Introduction to Database Design
An important part of database design is deciding on a suitable logical structure or schema to implement ... called database design.
Considering supplier parts example (S,P,SP) there is a feeling of correctness.
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 91
![Page 94: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/94.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
Normalisation theory is a formalism of simple ideas with a practical application in logical database schema design.
Normalisation theory should allow us to recognise relations with undesirable properties, tell us what is "wrong" & how to "correct" it.
Normalisation theory is built around normal forms - each normal form has a set of satisfiable criteria.
Normal forms exist in a hierarchy:
1NF -> 2NF -> 3NF -> BCNF -> 4NF -> PJ/NF (5NF)
Codd defined 1NF, 2NF, 3NF in 1972. Note: Monologue says 1992, but 1972 is correct.
3NF had inadequacies so it was revised in 1974 by Boyce/Codd (BCNF).
1977 Fagin defined 4NF, 1979 defined 5NF.
6NF,7NF ?... dependencies theory suggests there may be higher NFs but not practicable in database environment.
DB designers should aim for higher NFs but this is not law - just recommended as normalisation simply provides guidelines for database design.
There are often good reason for not using normalisation theory.
In order to describe the various normal forms we must first introduce some definitions:
Functional DependencyGiven relation R, attribute Y of R is functionally dependent on X of R, R.X -> R.Y, or R.X functionally determines R.Y ...
... iff each R.X value has associated with it precisely one R.Y value, where X and/or Y may be composite.
S.SNAME, S.STATUS and S.CITY are each functionally dependent on S.S#
If R.X is a candidate key or if R.X is the primary key, then all R.Y must be functionally dependent on R.X
In SP we have a composite primary key so
SP.(S#,P#) -> SP.QTY
There is no requirement in the definition of functional dependence that R.X be a candidate key, thus:
R.X -> R.Y iff whenever 2 tuples of R.X are the same then the corresponding R.Y values are also the same.
R.Y is fully functionally dependent on R.X iff it is functionally dependent on R.X and not fully functionally dependent on any subset of R.X
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 92
![Page 95: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/95.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
S.(S#,STATUS) -> S.CITY is true but not full functional dependence as S.S# -> S.CITY
If R.X -> R.Y but not fully then R.X must be composite
A functional dependency diagram is used to represent graphically, full functional dependencies … for example:
Functional dependence is a semantic notion to do with understanding what the data means rather than because of the properties of a particular data set at a given time.
10-2. 3NF, 2NF and 1NF
Definition 1: 3 NFA relation R is in 3NF iff the nonkey attributes of R are mutually independent and fully dependent on the primary key of R
Nonkey in this sense means not part of the primary key and mutually independent means none of the attributes are functionally dependent on any others.
P(P#,PNAME,COLOUR,WEIGHT) is 3NF because we can change nonkey attributes independently and all are functionally dependent on P#.
Definition 2: 3NFA relation R is in 3NF iff each tuple consists of a primary key to identify a real entity plus 0 or more mutually independent attribute values to describe that entity.
R is in 1NF iff all underlying domains are atomic.
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 93
![Page 96: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/96.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
In order to show 2NF let us unite S and SP to get:
FIRST(S#,STATUS,CITY,P#,QTY)
We also introduce a new constraint such that STATUS is functionally dependent on CITY, eg
London suppliers have status 10, alwaysParis suppliers have status 20, alwaysMunich suppliers have status 20, always etc ...
Primary key is (S#,P#) and the functional dependency diagram is ...
Definition 3: 3NFA relation in 3NF has arrows out of primary key only.
In FIRST, additional arrows cause trouble as the nonkey attributes are not mutually independent and not all attributes are dependent on the primary key.
What are the difficulties with this relation anyway ?
The problem with FIRST is that it stores redundant information which can lead to update anomalies as follows:
INSERT: Cannot insert the fact that a supplier exists until that supplier actually makes a shipment
DELETE: Deleting the last tuple based on S#,P# could lose the information that S3 is located in CITY
UPDATE: CITY values occur for each shipment thus an update of CITY is unnecessarily expensive.
One solution is to replace FIRST by:
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 94
![Page 97: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/97.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
SECOND(S#,STATUS,CITY) and SP(S#,P#,QTY)
This yields the following FD diagram:
This is appealing as follows:
INSERT: can enter the fact that S5 is in ATHENS without S5 actually having to make a shipment
DELETE: can delete shipment tuples and not lose location information
UPDATE: information appears once only thus updating is more efficient
10-3. BCNF
3NF has the following inadequacies in that it cannot handle cases of relations with:
multiple candidate keys where
candidate keys are composite
candidate keys overlap
The above combination of events do not occur very often in practice, but they are not contrived and they do exist.
BCNF was defined to address the above and the definition of BCNF is stronger than that of 3NF.
A functional determinant is an attribute on which some other attribute is fully functionally dependent.
Definition of BCNFA relation R is in BCNF iff every determinant is a candidate key ... not just primary keys!
This is a simpler definition than 3NF with no references to 1NF or 2NF or transitive dependencies.
Now for some confusion ...
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 95
![Page 98: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/98.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
Textbooks sometimes differ in their definitions of 3NF and whether a relation in 3NF is also in BCNF !
The precise and exact definitions do not assume that each R has exactly one CK, the PK, as is done in most textbooks, and those definitions are as follows:
2NF == 1NF and each non-prime attribute is FFD on each CK
3NF == 2NF and none of the non-prime attributes are transitively dependent on any CKs
Here non-prime is not part of any candidate key.
However many textbooks simplify the definitions by assuming that each R has one CK which is the PK !
Any given relation can be non-loss decomposed into an equivalent collection of BCNF relations.
FIRST,SECOND are not in BCNF
SP,SC,CS are in BCNF
Lets illustrate BCNF with a set of examples, some in BCNF, some not.
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 96
![Page 99: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/99.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
10-3-1. BCNF Example 1
S(S#, SNAME, STATUS, CITY)
S# and SNAME are both candidate keys, i.e. numbers and names of suppliers are both unique.
STATUS and CITY are mutually independent, with the FD diagram ...
S is in BCNF as the only determinants are candidate keys.
In S, candidate keys are atomic and thus non-overlapping.
10-3-2. BCNF Example 2
SSP(S#, SNAME, P#, QTY)
Candidate keys are (S#, P#) and (SNAME, P#), say 1st is primary key with FD diagram ...
Not in BCNF as 2 determinants, S#,SNAME are not candidate keys so the table will contain redundancies and have certain update anomalies.
SSP is in 3NF because that definition does not require an attribute to be fully dependent on the primary key if it itself is a component of some alternate key.
Solution: break SSP into 2 projections either:
SS(S#,Sname) and SP(S#,P#,QTY)
or
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 97
![Page 100: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/100.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
SS(S#,Sname) and SP(Sname,P#,QTY)
All of these are in BCNF.
10-3-3. BCNF Example 3
SJT(S, J, T)
Here student S is taught subject J by teacher T with the following constraints:
1. For each subject each student is taught by only 1 teacher2. Each teacher teaches only 1 subject3. Each subject taught by several teachers
This is a bit like secondary school, with the following FD diagram ...
Here we have two overlapping candidate keys (S, J) and (S, T) and SJT is in 3NF but it is not in BCNF so we could get update anomalies caused by T being a determinant but not a CK (Candidate Key).
Solution: replace SJT by 2 projections:
ST(S, T) and TJ(T, J)
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 98
![Page 101: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/101.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
10-3-4. BCNF Example 4
EXAM(S, J, P)
Here student S was examined in subject J and achieved rank position P in the class with the constraint that there are no ties for positions.
This yields the following FD diagram ....
Here we have composite and overlapping candidate keys (S, J) and (J, P) but just because we have such a situation does not mean we need to normalise because EXAM is already in BCNF !
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 99
![Page 102: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/102.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
10-4. 4NF
Consider the following ...
course (C) taught by one of a set of teachers (T)
for each course there is a repeating set of recommended textbooks (X)
for each course there may be any numbers of teachers and any numbers of recommended texts
teachers and texts are independent
teachers can be associated with any number of courses
This corresponds closely with a large secondary school with DoE recommended textbooks, teachers doubling up and many teachers.
We could "flatten" this information into a 1NF relation called CTX.
CTX
Course Teacher Textbook
L.C. Math Smith H+M 4
L.C. Math Smith H+M 5
L.C. French Kelly Folens 1
L.C. English Doyle Hamlet
L.C. Math Doyle H+M 4
L.C. Math Doyle H+M 5
CTX
Course Teacher Textbook
L.C. Math Smith H+M 4
L.C. Math Smith H+M 5
L.C. French Kelly Folens 1
L.C. English Doyle Hamlet
L.C. Math Doyle H+M 4
L.C. Math Doyle H+M 5
There are no FDs in data so no basis for decomposition but there is still some redundancy in CTX.
If (C1, T1, X1) and (C1, T2, X2) then there must also be the following tuples present ... (C1, T1, X2) and (C1, T2, X1) !
This is redundancy and thus we can have update anomalies.
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 100
![Page 103: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/103.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
For example, if we add (Geography, Ryan, Holland) and (Geography, Scott, Gaines) then we must also add (Geography, Ryan, Gaines) and (Geography, Scott, Holland).
Examining the criteria for normal forms however we find CTX is (trivially) in BCNF as the 3 attributes make up the sole CK !
It would be desirable to decompose CTX into :
CT(Course, Teacher) and CX(Course, Text)
Both of these are in BCNF as both are "all key".
CTX
Course Teacher Textbook
L.C. Math Smith H+M 4
L.C. Math Smith H+M 5
L.C. French Kelly Folens 1
L.C. English Doyle Hamlet
L.C. Math Doyle H+M 4
L.C. Math Doyle H+M 5
So CTX would be represented as :
CT CX
Course Teacher Course Text
L.C. Math Smith L.C. Math H+M 4
L.C. French Kelly L.C. Math H+M 5
L.C. English Doyle French Folens 1
L.C. Maths Doyle English Hamlet
This decomposition is based on Fagin's multi-valued dependencies (MVDs).
course ->-> teachercourse ->-> text
A course does not have a single corresponding teacher, it has a well-defined set of teachers and for a course c and text x the set of teachers depends on the value of c, independent of x.
Definition of 4NFA relation R is in 4NF if it is in BCNF and all MVDs are FDs.
CTX is not in 4NF; CT, CX are in 4NF … 4NF is more desirable as it eliminates redundancies
For R with attributes A, B & C (which may be composite !)
R.A ->-> R.B
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 101
![Page 104: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/104.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
if the set of R.B values match (Avalue, Cvalue) in R and this depends on A, independent of C
R must have at least 3 attributes and for R(A, B, C) then R.A ->-> R.B holds iff
R.A ->-> R.C also holds
MVDs always go in pairs
If R.A ->-> R.B | R.C then R can be non-loss decomposed into R1(A, B) and R2(A, C)
10-5. 5NF
Some relations cannot be non-loss decomposed by projection into 2 relations but can be composed by projection into 3+ relations ... called n-decomposable for n > 2.
In reading about 5NF I have never found a non-contrived example to illustrate 5NF because 5NF is more theoretical than real ... anyway here is an example:
SPJ is a relation about suppliers, parts and projects.
1. Smith supplies wrenches2. wrenches are used in Block23. Smith supplies Block2
If 1,2 & 3 hold then Smith supplies wrenches to Block2 also holds as true.
Normally this implication does not hold, but if it does we call it a JOIN dependency and SPJ is a JD over (SP,PJ,JS) and should be decomposed into 3 relations yielding three relations, all in 5NF.
SPJ is not in 5NF because it has a join dependency but discovering such JDs is not easy ... this is because FDs and even MVDs have a straightforward real-world interpretation whereas a JD does not.
If R is in 5NF then it is also in 4NF.
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 102
![Page 105: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/105.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
10-6. Database Design
Database design is all about designing a schema of tables which captures all information needs from the portion of the real world being modelled, in such a way that no unnecessary redundancies are stored which could lead to update anomalies.
Reasoning about the normal forms of tables in a schema helps us determine if update anomalies can occur in theory.
Database design is a give-and-take task, fluid, revised continuously as users’ needs change and the information being modelled changes.
A database design is never complete, it is always evolving.
The task of database design is separate but related to the task of the DBA.
If the design of a database has commenced with the construction of an E-R diagram, then this can be used to determine a first version of the relational schema, but only a first version.
Turning E-R entities into relational tables is easy
Turning E-R 1:1 relationships into tables is also easy by storing the PK of one entity as an FK attribute of the other ... which one to embed in the other affects performance of queries, choice of database designer
Turning E-R 1:N relationships into tables is done by placing the PK of the relation representing the parent entity as a FK in the relation representing the child entity ... unlike 1:1, it does matter which is FK embedded in the other.
Turning E-R M:N relationships into tables is done by creating an additional table, an intersection relation, to represent the relationship itself ... i.e. decompose the M:N into two 1:N relationships. The PK of the new relation is the combination of PKs of its "parents".
Representing recursive relationships, which can be 1:1, 1:N or N:M, is done by embedding key for one in itself (1:1 and 1:N) or creating an additional table (N:M) ... so the fact that it is recursive is actually not important.
Having gone through the effort of an E-R modelling exercise and then the generation of a first approximation at a database schema, this first version may then be refined or de-normalised.
Given R in 1NF and FDs, MVDs and JDs, we systematically reduce R to a collection of smaller relations which are "more desirable", by taking projections in order to eliminate redundancy and the possibility of update anomalies.
But these are guidelines only and don't always have to be followed … often we want to de-normalise a database design.
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 103
![Page 106: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/106.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
Chapter 11. Database and the Web
This chapter covers applications of databases in the Internet.
11-1. Background Databases in the Web
11-2. JDBC Introduction
11-3. JDBC Tutorial
11-4. Databases and the Web - the Future
Sources: Elmasri & Navathe Navathe (1999) Chapter 27., Campione, Walrath The Java Tutorial
11-1. Background Databases in the Web
A simple architecture:
Client/server architecture
Information is stored in publicly accessible files on machines called Web servers
Files are encoded in HTML
Files are identified by URLs
Data (files) is communicated using HTTP
A three-tiered architecture:
Client (browser) - middleware (CGI) - backend (database)
A Common Gateway Interface (CGI) acts as the middleware between a client and a database at the back end.
CGI software executes programs/scripts to obtain dynamic information (instead of static file content)
Typical CGI languages
scripts: Perl, Tcl
The main disadvantage of this approach is that for each user request the Web server starts a new process, which, in case of a database backend, then connects to the database. At the end of the request, the connection is closed and the process terminates.
programs: Java (JDBC)
JDBC (and Java servlets) should provide a more efficient platform, without the need for time-consuming additional processes and database connections.
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 104
![Page 107: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/107.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
Database content can be displayed using a Web browser.
The presentation can be formulated in HTML.
Here is the table s from the Supplier/Parts example:
SNO SNAME STATUS CITY
S1 Smith 20 Paris
S2 Jones 10 Paris
S3 Blake 30 Rome
The HTML code is here:
<table align=center border=2 cellpadding=2 bgcolor=white>
<tr bgcolor=grey>
<td>SNO</td>
<td>SNAME</td>
<td>STATUS</td>
<td>CITY</td>
</tr>
<tr>
<td>S1</td>
<td>Smith</td>
<td>20</td>
<td>Paris</td>
</tr>
<tr>
<td>S2</td>
<td>Jones</td>
<td>10</td>
<td>Paris</td>
</tr>
<tr>
<td>S3</td>
<td>Blake</td>
<td>30</td>
<td>Rome</td>
</tr>
</table>
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 105
![Page 108: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/108.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
11-2. JDBC Introduction
JDBC is Sun's solution to the inefficiency of CGI-scripts connecting to databases.
JDBC provides facilities (Java JDBC API) to
connect to the database (Java class Connection)
send an SQL statement to the database (Java class Statement)
process a result (Java class ResultSet)
see Java JDBC API definition for more details.
The Java code is DBMS transparent, which means that any code needed to establish and maintain the connection to the database is hidden.
JDBC drivers, called by methods of the Java classes Connection and Statement, handle the connection management.
JDBC drivers for particular database management systems need to be installed, or a JDBC-ODBC bridge needs to be loaded if the connection to the database shall be made via Microsoft's ODBC mechanism.
11-3. JDBC Tutorial
This tutorial shall show how to connect from a server to a database.
The tutorial is based on Java.
Content:
Database Access
Using the dbWrapper class
Execute the program
Connecting to an Oracle DB
The first three sections describe how to connect to a database running under Windows NT.
The fourth section describes how to connect to an Oracle DB running under Unix.
Database Access
The interface to the database shall be realised in a class called dbWrapper. The dbWrapper class provides three methods:
Open: opens the connection to the database
Select: executes a query or an update, i.e. executes an SQL statement and prints the result
Close: closes the connection to the database
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 106
![Page 109: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/109.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
There is also a private method called printResultSet which prints the result of e.g. select statements. This method is called by the Select method.
import java.sql.*;
The java.sql package provides means to execute queries or updates.
Global variables of the class are
a statement object (which will allow us to pass SQL statements to the database),
an object representing the connection to the database,
a URL string (denoting the URL of the database server),
and strings for the username and password for the database.
class dbWrapper
{
Statement stmt;
Connection con;
String strUrl;
String strUserName;
String strPassword;
The constructor assigns user name and password. It also constructs the URL string consisting of a protocol part, here jdbc:odbc: which means that JDBC Java DataBase Connectivity interfaces Microsoft's ODBC Open DataBase Connectivity which then connects to the database server - whose Internet address is denoted by the DSN Data Source Name strDSN.
public dbWrapper(String strDSN)
{
strUrl = "jdbc:odbc:" + strDSN;
strUserName = "guest";
strPassword = "guest";
}
A driver is needed which bridges between the Java DB connectivity and Microsoft's ODBC. This driver is loaded from Sun's site. Then the connection to the database, specified by URL, user name and password, is established. Finally, a statement object is created, which will allow us to pass SQL statements to the database
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 107
![Page 110: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/110.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
public void Open()
{
...
// Load the jdbc-odbc bridge driver
Class.forName("sun.jdbc.odbc.JdbcOdbcDriver");
// Attempt to connect to a driver.
con = DriverManager.getConnection( strUrl, strUserName,
strPassword );
// Create a Statement object so we can submit
// SQL statements to the driver
stmt = con.createStatement();
...
}
The Select method allows us to execute an SQL select (or update) statement. strQuery is a string containing the query (or update). strQuery is executed and a resultSet is returned. This result set - in case of a SELECT statement a set of tuples (records) - is then processed using the method printResultSet(). Then, the result set is closed (i.e. discarded).
public void Select( String strQuery )
{
...
// Submit a query, creating a ResultSet object
ResultSet rs = stmt.executeQuery( strQuery );
// Display all columns and rows from the result set
printResultSet (rs);
rs.close();
...
}
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 108
![Page 111: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/111.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
Close closes the connection to the database.
public void Close()
{
...
stmt.close();
con.close();
...
}
The method printResultSet prints the result set which has been returned by the statement execution. For each record in the result set (obtained by rs.next()), all attribute values are printed in a for-loop.
private static void printResultSet(ResultSet rs) throws SQLException
{
int numCols = rs.getMetaData().getColumnCount();
while ( rs.next() )
{
for (int i=1; i<=numCols; i++)
{
System.out.print(rs.getString(i) + " | " );
}
System.out.println();
}
}
}
Using the dbWrapper class
Setup ODBC under Windows NT:
To open the database connection, you have to create a dbWrapper object.
The parameter is a DSN (data source name). This name has to be defined on your machine. For this, start the ODBC program (Settings -> Control Panel -> ODBC) under Windows NT.
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 109
![Page 112: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/112.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
Define a user DSN connecting to the database management system of your choice. Click on 'Add'. You need to select the appropriate driver (e.g. for MS SQLServer or MS Access), select SQL Server for this application here.
Choose a name (it should be 'TestDB' if you want to use this program here). Enter the name (or IP address) of the server on which your database is running (it should be 'gobi' if you want to connect to a MS SQL Server running in the School of Computer Applications). Choose the database you want to access (i.e. enter the name) using the Options-menu. Sometimes, leaving the field empty (i.e. using the default) will work.
If you want to connect to MS Access on your local machine, you have to choose the corresponding driver, give it a name, and your machine as the server.
Suppose the DSN you have defined is "TestDB", then the following establishes the connection.
dbWrapper myDB = new dbWrapper("TestDB");
myDB.Open();
There is a book-example in the DB. A sample query which could be executed is "SELECT * FROM Authors":
strSQLQuery = "SELECT * FROM Authors";
strResult = myDB.Select(strSQLQuery);
Execute the program
Here is the full source code of the dbWrapper class:
// dbWrapper Class
import java.sql.*;
class dbWrapper
{
Statement stmt;
Connection con;
String strUrl;
String strUserName;
String strPassword;
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 110
![Page 113: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/113.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
public dbWrapper(String strDSN)
{
// the DSN for the Db connection
strUrl = "jdbc:odbc:" + strDSN;
strUserName = "guest";
strPassword = "guest";
}
public void Open()
{
try
{
// Load the jdbc-odbc bridge driver
Class.forName("sun.jdbc.odbc.JdbcOdbcDriver");
// Attempt to connect to a driver.
con = DriverManager.getConnection( strUrl, strUserName,
strPassword );
// Create a Statement object so we can submit
// SQL statements to the driver
stmt = con.createStatement();
}
catch (SQLException ex)
{
while (ex != null)
{
System.out.println("SQL Exception: " + ex.getMessage() );
ex = ex.getNextException();
}
}
catch (java.lang.Exception ex)
{
ex.printStackTrace();
}
}
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 111
![Page 114: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/114.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
public void Select( String strQuery )
{
try
{
// Submit a query, creating a ResultSet object
ResultSet rs = stmt.executeQuery( strQuery );
// Display all columns and rows from the result set
printResultSet(rs);
rs.close();
}
catch (SQLException ex)
{
while (ex != null)
{
System.out.println ("SQL Exception: " + ex.getMessage () );
ex = ex.getNextException ();
}
}
}
public void Close()
{
try
{
stmt.close();
con.close();
}
catch (SQLException ex)
{
while (ex != null)
{
System.out.println("SQL Exception: " + ex.getMessage () );
ex = ex.getNextException();
}
}
}
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 112
![Page 115: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/115.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
private static void printResultSet(ResultSet rs) throws SQLException
{
int numCols = rs.getMetaData().getColumnCount();
while ( rs.next() )
{
for (int i=1; i<=numCols; i++)
{
System.out.print(rs.getString(i) + " | " );
}
System.out.println();
}
}
}
Connecting to an Oracle DB
This should explain how to connect to an Oracle database running under Unix.
The difference to the previous program is marginal.
Two changes are needed:
All you have to do is to use another URL, username and password
strUrl = "jdbc:oracle:thin:@pisang:1521:car";
strUserName = "testStudent";
strPassword = "test";
and to load another driver
// Register Driver
DriverManager.registerDriver(new oracle.jdbc.driver.OracleDriver());
instead of the Class.forName("sun.jdbc.odbc.JdbcOdbcDriver")-call. Here, we need a genuine Oracle driver instead of Sun's JDBC-ODBC bridge driver.
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 113
![Page 116: CA218CourseNotes.doc](https://reader036.fdocuments.net/reader036/viewer/2022062419/55869bacd8b42a79668b461e/html5/thumbnails/116.jpg)
CA218 Introduction to Databases Copyright © 1996-2001 Alan Smeaton, Claus Pahl
You can make one more change. The parameter of the dbWrapper constructor is not needed. This information is only important for setting up a connection under Windows. So, you can remove the parameter (or ignore it).
If the connection is successfully established, you can query the tables of the Supplier/Parts database (S, SP, P) and of the Elmasri/Navathe Company database (Employee, Department, etc.).
11-4. Databases and the Web - the Future
Electronic Commerce:
Database support is increasingly important for the emerging Electronic Commerce technologies
Both business-to-consumer (B2C) and business-to business (B2B) eCommerce rely on data managed using database management systems, which can be accessed via the Internet.
In the future, we expect to see:
the convergence of Web and object technologies, e.g. the Document Object Model DOM, which allows us to see documents as objects.
new languages more powerful than HTML, e.g. XML - the eXtensible Markup Language - allows us to define documents in a presentation-independent way and to exchange data independently of the system used to store the data.
Both developments will have an effect on what kind of data is stored in the databases and how it is stored.
/tt/file_convert/55869bacd8b42a79668b461e/document.doc Page 114