© Ron Rogerson 1998-2010 Slide 1 Relational Databases Ron Rogerson email [email protected].

64
© Ron Rogerson 1998-2010 Slide 1 Relational Databases Ron Rogerson email [email protected]

Transcript of © Ron Rogerson 1998-2010 Slide 1 Relational Databases Ron Rogerson email [email protected].

Page 1: © Ron Rogerson 1998-2010 Slide 1 Relational Databases Ron Rogerson email Ron@Howard-Rogerson.co.uk.

© Ron Rogerson 1998-2010 Slide 1

Relational Databases

Ron Rogerson

email [email protected]

Page 2: © Ron Rogerson 1998-2010 Slide 1 Relational Databases Ron Rogerson email Ron@Howard-Rogerson.co.uk.

© Ron Rogerson 1998-2010 Slide 2

Topics on this course

What is a database?and what is it used for?

What problems arise in managing one?

file based systemsdata dependence

What kind of system might resolve those problems?

the three level architecturefunctions of a dbmsfirm development principles - development cycle and conceptual modellingand . . one which uses . .

The relational approach - a theoretical architecture

the relational modelmanipulation with relational algebra

SQL - a practical implementation of relational theory

Bringing it together -developing a relational d/b with SQL

Other topics - introduced in Block 1 & returned to throughout course

background to data managementother kinds of information systemdevelopments in databases

Page 3: © Ron Rogerson 1998-2010 Slide 1 Relational Databases Ron Rogerson email Ron@Howard-Rogerson.co.uk.

© Ron Rogerson 1998-2010 Slide 3

What is a database?

Database: a collection of data stored in a computer system

Data: a representation of information (or, information as interpreted data):

Information can have >1 representation Data has no meaning in isolation (domain

of discourse) Computers process data, not information

Semantic properties of data User: a person whose information

requirements are being supported Sharing data - differing

requirements User Process

application process: single purpose database tool: general purpose

Page 4: © Ron Rogerson 1998-2010 Slide 1 Relational Databases Ron Rogerson email Ron@Howard-Rogerson.co.uk.

© Ron Rogerson 1998-2010 Slide 4

What problems arise in managing a

database? File-based approach

file organisation determines access ops. close association of program with file programs have to do it all (consistency,

relationships, access control etc) data likely to be duplicated resistent to change data dependent

Database approach DBMS, not programs, does access control,

manages storage/retrieval explicit single database definition - the

schema NOTE that, although we are still talking in

general about a structured database approach, some modern systems such as OO and XML databases do exhibit characteristics of the file-based approach

Page 5: © Ron Rogerson 1998-2010 Slide 1 Relational Databases Ron Rogerson email Ron@Howard-Rogerson.co.uk.

© Ron Rogerson 1998-2010 Slide 5

Dat

a R

etri

eval

in a

File

-Bas

ed S

yste

m

Tem

pla

te

No

of C

oupo

ns

Coupon 1

Coupon2

Coupon3

No

of G

uara

ntor

sSt

art

Sedo

lIs

suer

Issu

e D

ate

Gua

rant

or 1

Gua

rant

or 2

End

XX

XX

X

9

99

99

9X

X

XX

XX

XX

XX

XX

XX

XY

YY

Y

MM

DD

99

99

9

99

99

9X

XX

XX

XX

XX

XX

XX

XX

X

XX

XX

XX

XX

XX

XX

XX

XX

X

Dif

fere

nt te

mpl

ate

for

each

file

, 'ha

rd c

oded

' in

prog

ram

s

All

'link

s' b

etw

een

file

s do

ne b

y pr

ogra

ms

Page 6: © Ron Rogerson 1998-2010 Slide 1 Relational Databases Ron Rogerson email Ron@Howard-Rogerson.co.uk.

© Ron Rogerson 1998-2010 Slide 6

Actual systems architectures

Client-Server two-tier: interface & application on client,

d/b on server three-tier: interface, application and d/b

separate Client-Multiserver

multiple d/bs on separate servers each server providing specific data connection management required

Distributed dbms multiple d/bs on separate servers ddbms provides location independence,

security and integrity horizontal and/or vertical fragmentation two-phase commit controls updates replication

reduces network traffic improves availability always reduces consistency somewhat updates can be pushedor pulled

Mobile systems d/b fragments copied to intermittently

disconnected devices updated when possible

Page 7: © Ron Rogerson 1998-2010 Slide 1 Relational Databases Ron Rogerson email Ron@Howard-Rogerson.co.uk.

© Ron Rogerson 1998-2010 Slide 7

What kind of system might resolve these

problems?

- a system which provides the functions of a dbms

Data Definition Constraint Definition and

Enforcement Access Control Data Manipulation Restructuring and Reorganisation

restructure: a change to the design, e.g. adding a column or a table (logical schema)

reorganisation: a change within the design, e.g. to assimilate recently added records and optimise indices (storage schema).

Transaction Support Concurrency Support Recovery

Page 8: © Ron Rogerson 1998-2010 Slide 1 Relational Databases Ron Rogerson email Ron@Howard-Rogerson.co.uk.

© Ron Rogerson 1998-2010 Slide 8

What kind of system might resolve these

problems? (2)

a system which gives data independence

Logical data independence change to logical schema has no impact

on user processes Physical data independence

change to storage schema has no impact on user processes

ANSI/SPARC three-schema architecture

external logical storage

a system which provides interaction facilities

Data manipulation language /query language (eg SQL)

Host language; embedded statements Data definition language (eg SQL)

Page 9: © Ron Rogerson 1998-2010 Slide 1 Relational Databases Ron Rogerson email Ron@Howard-Rogerson.co.uk.

© Ron Rogerson 1998-2010 Slide 9

Stored d/b

Storage Schema

Logical Schema

External Schemas

User Processes

What kind of system might resolve these

problems? (3)The 3-schema architecture

Page 10: © Ron Rogerson 1998-2010 Slide 1 Relational Databases Ron Rogerson email Ron@Howard-Rogerson.co.uk.

© Ron Rogerson 1998-2010 Slide 10

What kind of system might resolve these

problems? (4)One developed using the

database development lifecycle:

Establishing data requirements Data analysis

produces conceptual data model Database design

produces logical schema (NB specific to db type)

Implementation produces storage schema et al (NB

specific to platform) Testing

may lead to iteration back to any of the earlier stages

One developed using a conceptual data model:

a formal representation of a data requirement, independent of how it may be realised

Page 11: © Ron Rogerson 1998-2010 Slide 1 Relational Databases Ron Rogerson email Ron@Howard-Rogerson.co.uk.

© Ron Rogerson 1998-2010 Slide 11

What kind of system might resolve these problems?

Conceptual data models Entity-Relationship model

entity types and entity occurrences attributes (identifier - special unique

attribute) Relationships

Degrees of relationships

Bus Driver

1 n A Bus can have many Drivers A Driver drives not more than one bus

Participation conditions

A bus may have no driver

A driver must be allocated to a bus

Bus Driver

Optional Mandatory

Page 12: © Ron Rogerson 1998-2010 Slide 1 Relational Databases Ron Rogerson email Ron@Howard-Rogerson.co.uk.

© Ron Rogerson 1998-2010 Slide 12

What kind of system might resolve these problems?

Conceptual data models (2)Entity typesBus(RegNo,NoOfSeats,Date1stReg)Driver(DriverNo,Surname,DateOfBirth)

Weak and Strong entity types

Constraint a statement of a necessary restriction that

cannot be expressed elsewhere in the model (e.g.only drivers over 21 may drive buses with more than 8 seats)

Assumption a statement of something that had to be

assumed in order to complete the model and needs pointing out (e.g. only a driver’s current bus allocation is shown)

Page 13: © Ron Rogerson 1998-2010 Slide 1 Relational Databases Ron Rogerson email Ron@Howard-Rogerson.co.uk.

© Ron Rogerson 1998-2010 Slide 13

m:n relationships

What kind of system might resolve these problems?

Conceptual data models (3)

Bus Driver

is this relationship possible?

what additional information needs to be recorded about occurrences of the relationship?

how might we record it? how does this change the Assumptions?

Recursive relationships

Driver

a driver supervises another driver

Page 14: © Ron Rogerson 1998-2010 Slide 1 Relational Databases Ron Rogerson email Ron@Howard-Rogerson.co.uk.

© Ron Rogerson 1998-2010 Slide 14

The relational approach - a theoretical architecture

Relation: can be pictured as a form of table, each row representing an occurrence.

Terminology column = attribute row = tuple no. of attributes = degree no. of tuples = cardinality identifier = primary key

Properties of Relations: all attributes have a value in every tuple all values are atomic all values of any attribute are same kind each attribute has name, unique within

relation each tuple is unique ordering of attributes& tuples not

significant.

Domain: a named set of values, with a common meaning, from which 1 or more attributes draw their actual values

NB values on different domains not comparable.

Page 15: © Ron Rogerson 1998-2010 Slide 1 Relational Databases Ron Rogerson email Ron@Howard-Rogerson.co.uk.

© Ron Rogerson 1998-2010 Slide 15

The relational approach - a theoretical architecture (2)

Candidate key: an attribute with the properties of uniqueness and minimality

implies a semantic constraint is either a primary or alternate key

Primary key: a candidate key chosen as the identifier

entity integrity rule declared in the schema definition

Alternate key: any other candidate key

declared in the schema definition

Qualified attribute names - dot notation; may be needed where same name occurs in >1 relation

Page 16: © Ron Rogerson 1998-2010 Slide 1 Relational Databases Ron Rogerson email Ron@Howard-Rogerson.co.uk.

© Ron Rogerson 1998-2010 Slide 16

Foreign key: an attribute (or combination of attributes) in a relation, whose values are the same as values of a candidate key (normally the primary key) of some (not necessarily distinct) relation.The only method of representing relationships

by default, represents an n:1 relationship always at the n “end” of a 1:n

(remember the “crow’s foot” pointer)

relationship can be represented by 'posting' primary key of the '1' end into the other relation – 'posted attribute' method

The relational approach - a theoretical architecture(3)

must always have value same as that of some value of the key it references (referential integrity rule)

semantic constraint

Page 17: © Ron Rogerson 1998-2010 Slide 1 Relational Databases Ron Rogerson email Ron@Howard-Rogerson.co.uk.

© Ron Rogerson 1998-2010 Slide 17

The relational approach - a theoretical architecture(4)

Bus Driver

Why can't the foreign key go at the “1” end?

Bus Driver RegNo Make Driver Id Name

ABC123 Scania 1,2 1 Brown DEF456 Volvo 3 2 SmithGHJ789 DAF 3 Bloggs

Foreignkey

Representing a 1:n using posted key

Bus Driver RegNo Make Id Name BusNo

ABC123 Scania 1 Brown ABC123DEF456 Volvo 2 Smith ABC123GHJ789 DAF 3 Bloggs DEF456

Foreignkey

Page 18: © Ron Rogerson 1998-2010 Slide 1 Relational Databases Ron Rogerson email Ron@Howard-Rogerson.co.uk.

© Ron Rogerson 1998-2010 Slide 18

The relational approach - a theoretical architecture(5)

Bus Driver

so . . .

Representing a 1:1 using posted key

Bus Driver RegNo Make Id Name BusNo

ABC123 Scania 1 Brown ABC123DEF456 Volvo 3 Bloggs DEF456GHJ789 DAF

this becomes an alternate

key as well as foreign

key

Bus Driver RegNo Make Id Name BusNo

ABC123 Scania 1 Brown ABC123DEF456 Volvo 2 Smith ABC123GHJ789 DAF 3 Bloggs DEF456

We can't have this

duplication

Page 19: © Ron Rogerson 1998-2010 Slide 1 Relational Databases Ron Rogerson email Ron@Howard-Rogerson.co.uk.

© Ron Rogerson 1998-2010 Slide 19

The relational approach - a theoretical architecture (6)

Recursive relationships m:n relationships cannot be

represented by 'posted attribute' why not?

relationships which are optional at the 'n' end cannot be represented, either

why not?

Deletions from a referenced relation:

Restricted effect cascade delete effect default effect

Page 20: © Ron Rogerson 1998-2010 Slide 1 Relational Databases Ron Rogerson email Ron@Howard-Rogerson.co.uk.

© Ron Rogerson 1998-2010 Slide 20

Representing m:n relationships

The relational approach - a theoretical architecture (7)

Bus DriverAllocated

Bus(RegNo,NoOfSeats,Date1stReg)

Driver(DriverNo,Surname,DateOfBirth)

Bus DriverAllocation

Allocation(RegNo,DriverNo,Date)

m:n decomposed into new intersection relation + two 1:n relationships

what new semantic constraint is imposed by the above choice of primary key?

how could we relax it? NB: conceptual - decompose to add info. relational - decompose to represent at all Rules for new intersection relations

(when used to decompose an m:n)

Participation conditions Degrees

New entity Both mandatory Both "n"

Old entities Same as before Both "1"

n

Page 21: © Ron Rogerson 1998-2010 Slide 1 Relational Databases Ron Rogerson email Ron@Howard-Rogerson.co.uk.

© Ron Rogerson 1998-2010 Slide 21

The relational approach - a theoretical architecture (8)

Representing optional relationships (relation for relationship method)

mandatoryl relationships can use this method too - much more complex than posted attribute, but treats all relationships same way

A BAB

A(a) B(b, a)becomes . .

A B

A(a) B(b)AB(b, a)

AB

becomes . .

A BAB

A(a) B(b, a) (whereB.a is an alternate key)

(note that the non-p.k. of AB must be declared alternate key)

A B

A(a) B(b)AB(b, a)

AB

AB(a, b)

n

N OTE that degrees of original relations “cross over” as they move to the new relation, and as with an m:n, the original relations become “1”

Page 22: © Ron Rogerson 1998-2010 Slide 1 Relational Databases Ron Rogerson email Ron@Howard-Rogerson.co.uk.

© Ron Rogerson 1998-2010 Slide 22

The relational approach - a theoretical architecture

Relational Algebra Operators act on whole relations

(set operators) Have closure property (produce

new relation) Relationally complete theoretical

basis for manipulation Operators reflect structure of

relations (and v.v.)

SELECTselect Allocation where RegNo = ‘R123ABC’

produces horizontal slicing of relation; i.e picks tuples according to some value(s) of attribute(s)

in this case, lists numbers of all drivers ever allocated to the bus with RegNo R123ABC

Page 23: © Ron Rogerson 1998-2010 Slide 1 Relational Databases Ron Rogerson email Ron@Howard-Rogerson.co.uk.

© Ron Rogerson 1998-2010 Slide 23

The relational approach - a theoretical architecture

Relational Algebra (2)

PROJECTproject Bus over NoOfSeatsproduces a “vertical slicing”; i.e., selects all the unique (combinations of) value(s) of chosen attribute(s)in this case, lists (once only in each case) the seating capacities of buses in the fleet

Combining expressions alias

DriversUnder21DriversUnder21 alias

(select Driver where DateOfBirth > 19710425)

project DriversUnder21 over Surname

nested

project (select Driver where DateOfBirth >19710425 ) over Surname

Page 24: © Ron Rogerson 1998-2010 Slide 1 Relational Databases Ron Rogerson email Ron@Howard-Rogerson.co.uk.

© Ron Rogerson 1998-2010 Slide 24

The relational approach - a theoretical architecture

Relational Algebra (3)JOIN (natural)join Driver and Allocation“pastes together” the tuples of the given relations which match on some attribute(s) with same name & domain – in this case, DriverNoin this case, produces a complete record of every driver allocation including, for each tuple, all the attributes from each table – but the joining column appears once onlyNOTE this means the new table will contain all the details of every driver once for every time s/he has been allocated to a bus

A join is over a shared attribute and that normally means a foreign key

A relation can be joined to itself (e.g. where there is a recursive relationship) (but must use aliases)

Page 25: © Ron Rogerson 1998-2010 Slide 1 Relational Databases Ron Rogerson email Ron@Howard-Rogerson.co.uk.

© Ron Rogerson 1998-2010 Slide 25

The relational approach - a theoretical architecture

Relational Algebra (4) DIVIDE

Allocations alias(project Allocation over DriverNo, RegNo)

DriverNo RegNo100 N456CDE101 N456CDE101 R123ABC

Buses alias (project Bus over RegNo)

RegNoN456CDER123ABC

divide Allocations by Buses over RegNoDriverNo101

(produces list of nos. of drivers who’ve been allocated to all buses)

Page 26: © Ron Rogerson 1998-2010 Slide 1 Relational Databases Ron Rogerson email Ron@Howard-Rogerson.co.uk.

© Ron Rogerson 1998-2010 Slide 26

The relational approach - a theoretical architecture

Relational Algebra (5)

UNION, INTERSECTION and DIFFERENCE require union-compatibility

each relation involved is such (or can be changed such) that the ith attribute of each is on the same domain and has the same name

UNION“adds” relations together

YoungDrivers alias

(select Driver where DateOfBirth >19770425)

OldDrivers alias

(select Driver where DateOfBirth < 19380426)

YoungDrivers union OldDrivers

Page 27: © Ron Rogerson 1998-2010 Slide 1 Relational Databases Ron Rogerson email Ron@Howard-Rogerson.co.uk.

© Ron Rogerson 1998-2010 Slide 27

The relational approach - a theoretical architecture

Relational Algebra (6)INTERSECTION

picks tuples of 2 relations which occur in both

NotUnder21 alias

(select Driver where DateOfBirth <19770426)

NotOver60 alias

(select Driver where DateOfBirth > 19380425)

NotUnder21 intersection NotOver60

NOTE that, in this case, the whole operation is logically equivalent to an “and”

OTHER OPERATORS theta-join Cartesian product outer join

CONSTRAINTS USING R. A.constraint (project Bus over DriverNo) difference (project Driver over DriverNo) is empty

Page 28: © Ron Rogerson 1998-2010 Slide 1 Relational Databases Ron Rogerson email Ron@Howard-Rogerson.co.uk.

© Ron Rogerson 1998-2010 Slide 28

Relational Algebra (7)

Bus Driver

Using relational algebra to represent mandatory participation at the referenced

end of a relationship

(project Bus over RegNo)constraint (

difference(project Driver over BusNo)

) is empty

Bus Driver RegNo Make Id Name BusNo

ABC123 Scania 1 Brown ABC123DEF456 Volvo 2 Smith ABC123GHJ789 DAF 3 Bloggs DEF456

Foreignkey

This bus has no driver

Bus Driver RegNo Make Id Name BusNo

ABC123 Scania 1 Brown ABC123DEF456 Volvo 2 Smith ABC123

3 Bloggs DEF456

Page 29: © Ron Rogerson 1998-2010 Slide 1 Relational Databases Ron Rogerson email Ron@Howard-Rogerson.co.uk.

© Ron Rogerson 1998-2010 Slide 29

The relational approach - a theoretical architecture

Relational Algebra (7)

Updating

Insertion:Driver:= Driver union <023,Smith,19460319>

Deletion:Driver:= Driver difference <023,Smith,19460319>

Amendment:Driver:= Driver difference <023,Smith,19660319>Driver:= Driver union <023,Smith,19660419>

Page 30: © Ron Rogerson 1998-2010 Slide 1 Relational Databases Ron Rogerson email Ron@Howard-Rogerson.co.uk.

© Ron Rogerson 1998-2010 Slide 30

The relational approach - a theoretical architecture

Normalisation (1)Normalisation aims to remove redundancy

avoids possible inconsistency avoids deletion/insertion anomalies reduces storage

In a normalised relation (i.e., one which is in BCNF) every non-p.k. attribute is a fact about the p.k., the whole p.k. and nothing but the p.k.

Single Valued Facts for every Girl there is (exactly) one Boy

Functional Dependencies Girl -> Boy Girl “determines” Boy note the E-R equivalent:

note that Girl -> Boy does not mean Boy -> Girl (we say an FD is "not reversible")

but what would the E-R diagram look like if we did additionally know that Boy -> Girl ?

Girl Boy

Page 31: © Ron Rogerson 1998-2010 Slide 1 Relational Databases Ron Rogerson email Ron@Howard-Rogerson.co.uk.

© Ron Rogerson 1998-2010 Slide 31

The relational approach - a theoretical architecture

Normalisation (2)

Derived FDs by transitivity:

Girl -> Boy Boy -> Boy’s_Mother hence, Girl -> Boy’s_Mother

quick check: are there any FDs given, whose "left hand" is the

same as the "right hand" of any other?

by augmentation and transitivity: Programme,StartTimeDate -> Announcer TVChannel, StartTimeDate -> Programme we can augment the second FD to:

TVChannel, StartTimeDate -> Programme, StartTimeDate therefore TVChannel,StartTimeDate -> Announcer

quick check: are there >1 FDs with combined attributes on the LH?

- if not, no augmentation can be done, but if so, do any of those have a RH which is part of the

LH of another? if not, no augmentation can be done, but

if so, can the RH of that first FD be augmented to make it the same as the LH of the other?

Page 32: © Ron Rogerson 1998-2010 Slide 1 Relational Databases Ron Rogerson email Ron@Howard-Rogerson.co.uk.

© Ron Rogerson 1998-2010 Slide 32

The relational approach - a theoretical architecture

Normalisation (3) First Normal Form (1NF)

A relation is in 1NF iff every non-p.k. attribute is functionally dependent on (i.e., is a fact about) the p.k.

Note that, anything which is a relation must at least be in 1NF

Second Normal Form (2NF) A relation is in 2NF iff it is in 1NF and every non-p.k.

attribute is fully functionally dependent on the p.k. (i.e., not on any subset of the p.k.)

Note that we are only interested in the dependency (or lack of it) between a non-p.k. attribute and the p.k., not in any other dependencies among the non-p.k. attributes.

Moving from 1NF to 2NF “Project out” any “offending” FDs into new relation(s) Determinant (l.h. side) of these FDs becomes p.k. of

new relation “R.h. side(s)” of these FDs becomes non-p.k.

attribute(s) of the new relation(s) and is/are removed from the “old” one

But the determinant remains in the “old” relation so that we have non-loss decomposition

“Projected-out” FDs which share the same determinant will go into a shared new relation

Process must be “non-loss”, i.e. the original relation could be recreated by “joining” the new ones.

Page 33: © Ron Rogerson 1998-2010 Slide 1 Relational Databases Ron Rogerson email Ron@Howard-Rogerson.co.uk.

© Ron Rogerson 1998-2010 Slide 33

The relational approach - a theoretical architecture

Normalisation (4)

Third Normal Form (3NF) A relation is in 3NF iff it is in 2NF and every non-

p.k. attribute is non-transitively dependent on the p.k. (take great care with definitions in the course material)

Note that a transitively-derived F.D. does not necessarily make the attribute transitively dependent i.e., where A -> B and B -> C, then A -> C is a

transitively derived dependency; but C is not transitively dependent on A if either B -> A or C -> B

Moving from 2NF to 3NF Process is just the same as 1NF to 2NF, except

that we “project out” the FD which is the “right hand” part of the complete transitive FD, i.e. the “B -> C” part

Boyce-Codd Normal Form (BCNF) A relation is in BCNF iff it is in 3NF and every

determinant is a candidate key Note that, unlike 2NF and 3NF, we are

interested in all FDs in the relation, not just those involving the p.k.

Page 34: © Ron Rogerson 1998-2010 Slide 1 Relational Databases Ron Rogerson email Ron@Howard-Rogerson.co.uk.

© Ron Rogerson 1998-2010 Slide 34

The relational approach - a theoretical architecture

Normalisation (5) Moving from 3NF to BCNF

Process is just the same as previous stages

If the determinant of a F.D. in the relation is the p.k., then that’s fine

If it’s not the p.k., then unless it’s an alternate key, the F.D. is an “offending” one

It can only be an alternate key if it has a 1:1 relationship with the p.k.; we will only know this if it has a “reversible” F.D. with the p.k., i.e. A -> B and B -> A

Page 35: © Ron Rogerson 1998-2010 Slide 1 Relational Databases Ron Rogerson email Ron@Howard-Rogerson.co.uk.

© Ron Rogerson 1998-2010 Slide 35

Relational model SQL Relational

implementation

Theoretical specification of what is to

be done

Specifies how relational

model is to be implemented - could be many but in practice

only SQL

Many implementations

of SQLexist; often they cover only a subset of

the standard, and may cover a

superset

3 level architecture

SQL schema

D/b schema

Manipulation languages:

the relational algebra and relational calculus

SQL (does not include a storage DDL)

Implementation of some version

of SQL plus further

command set

Page 36: © Ron Rogerson 1998-2010 Slide 1 Relational Databases Ron Rogerson email Ron@Howard-Rogerson.co.uk.

© Ron Rogerson 1998-2010 Slide 36

Architectures - Theoretical and Actual

Stored d/b

Storage Schema

Logical Schema

External Schemas

User Processes

Stored d/b

User Processes

Database Schema

Base table

Base table

Base table

View

ViewView

Page 37: © Ron Rogerson 1998-2010 Slide 1 Relational Databases Ron Rogerson email Ron@Howard-Rogerson.co.uk.

© Ron Rogerson 1998-2010 Slide 37

SQL - a practical implementation of relational

theory

1970s IBM developments - Ted Codd - SEQUEL

(Structured English Query Language) Many variants developed

1987 First ANSI standard - SQL:1987 ("SQL1" - also

known as ISO9075, BS6964:1988) includes a DDL and DML lacked many features of the model

1989 SQL:1989 - added p.k. and f.k. constraints.

1992 SQL:1992 ("SQL2") defines many features beyond SQL:1989 most implementations support it many also offer "superset" functions which may

add features but reduce portability

1998 SQL3 includes aspects of OO technology not covered in this course

NOTES: SQL does not define a storage DDL, nor some

other management functions; an implementation may do these any way it chooses

SQL databases consist of tables and columns

Page 38: © Ron Rogerson 1998-2010 Slide 1 Relational Databases Ron Rogerson email Ron@Howard-Rogerson.co.uk.

© Ron Rogerson 1998-2010 Slide 38

SQL - a practical implementation of relational

theory (2)

SELECT the query statement operates on tables all query statements produce a table logical processing model helps to explain how it

works

SELECT * FROM country “FROM" clause produces an intermediate table,

which is a full copy of the country table "SELECT *" produces a final table giving the result

- which in this case is also the full table

SELECT births, population FROM country "FROM" clause produces the intermediate table "SELECT" 'slices' this vertically into just the 2

columns required, which form the final table.

SELECT DISTINCT gdp FROM EUROBOND as above, but 'slices' vertically to produce only

one occurrence of each value compare with 'PROJECT' in the Relational Algebra

Page 39: © Ron Rogerson 1998-2010 Slide 1 Relational Databases Ron Rogerson email Ron@Howard-Rogerson.co.uk.

© Ron Rogerson 1998-2010 Slide 39

SQL - a practical implementation of

relational theory (3) VALUE EXPRESSIONS

(manipulating stored values) COLUMN:

SELECT name, (births/population)/1000 AS birth_rate_per_thousand FROM country

(NB can use + - * / (number), || (string) Columns must be suitable format)

SET: SELECT AVG(cars) FROM country (NB can use AVG, DISTINCT, COUNT(*),

SUM, MAX, MIN. Columns must be suitable format)

STRING FUNCTIONS: SELECT name, SUBSTR(name,1,3) FROM

country (NB also LENGTH, CAST, SUBSTRING, etc)

Page 40: © Ron Rogerson 1998-2010 Slide 1 Relational Databases Ron Rogerson email Ron@Howard-Rogerson.co.uk.

© Ron Rogerson 1998-2010 Slide 40

SQL - a practical implementation of relational

theory (4)The WHERE clause

(or, specifying a row search condition) SELECT staff_no

FROM staffWHERE name = 'Jennings’

Logical processing model for this query:FROM clause copies whole of STAFF into intermediate tableWHERE clause slices just the rows which meet it into a 2nd intermediate table

SELECT takes STAFF_NO into final table

SELECT name

((births-deaths)/population)*100

AS growth_rate

FROM country

WHERE

((((births-deaths)/population)*100

AS growth_rate

FROM country

WHERE

((births-deaths)/population)*100 > 0.5

(NB can use =, <, >, <=, >=, <> )

can use AND OR NOT (care needed with brackets)

Page 41: © Ron Rogerson 1998-2010 Slide 1 Relational Databases Ron Rogerson email Ron@Howard-Rogerson.co.uk.

© Ron Rogerson 1998-2010 Slide 41

SQL - a practical implementation of relational

theory (5)

Other Operators SELECT . . .

WHERE quantity BETWEEN 5000 AND 6000

(inclusive) WHERE name IN (‘Berlin’, ‘Bonn’, ... ,) WHERE classification LIKE ‘_h%s’ WHERE cars IS NULL

Joins using FROMSELECT s_country.name, capital, population

FROM s_country, s_cityWHERE capital = s_city.name(cf. the relational algebra JOIN)

(NB what kind of table does this FROM produce, in the logical processing model?)

Aliases SELECT p.staff_no, p.name, q.staff_no

FROM staff p, staff qWHERE p.name = q.nameAND p.staff_no < q.staff_no

(NB why < and not <> ?)

Page 42: © Ron Rogerson 1998-2010 Slide 1 Relational Databases Ron Rogerson email Ron@Howard-Rogerson.co.uk.

© Ron Rogerson 1998-2010 Slide 42

SQL - a practical implementation of relational

theory (6) Outer joinsSELECT student.student_id, name, phone_no

FROM student LEFT OUTER JOIN telephoneON student.student_id = telephone.student_id

(NB can use RIGHT OUTER, FULL OUTER)

Natural joinsSELECT student.student_id, name, phone_no

FROM student NATURAL JOIN telephone

GROUP BYSELECT product, COUNT(country),

SUM(quantity)FROM productionGROUP BY product

HAVING (is to groups what WHERE is to rows) SELECT product, COUNT(country),

SUM(quantity)FROM productionGROUP BY productHAVING SUM(quantity) > 15000

Page 43: © Ron Rogerson 1998-2010 Slide 1 Relational Databases Ron Rogerson email Ron@Howard-Rogerson.co.uk.

© Ron Rogerson 1998-2010 Slide 43

SQL - a practical implementation of relational

theory (7)

ORDER BY SELECT PRICEDATE, PRICE

FROM LUXPRICEWHERE LUXCODE = 123456AND CURRENCY = 'US'ORDER BY PRICEDATE DESC

(NB ASC is the default)

QUERY SEQUENCEStatement OrderSELECT . . .FROM . . .(WHERE . . .)(GROUP BY . . .)(HAVING . . .)(ORDER BY . . .)

Logical Processing ModelFROM . . .(WHERE . . .)(GROUP BY . . .)(HAVING . . .)SELECT . . .(ORDER BY . . .)

n

Page 44: © Ron Rogerson 1998-2010 Slide 1 Relational Databases Ron Rogerson email Ron@Howard-Rogerson.co.uk.

© Ron Rogerson 1998-2010 Slide 44

SQL - a practical implementation of

relational theory (8)

COMPOSITE QUERIES Note that SQL has no equivalent of joining

queries using 'alias' in the Relational Algebra. Most complex queries can be handled by the following methods.

UNION

SELECT country, yr, population

FROM population

WHERE country IN (‘Spain’,’Ireland’)UNION

SELECT name, 1990, population

FROM country

WHERE name in (‘Spain’, ‘Ireland’)

SUBQUERIES ('nested' queries)

Generally, a subquery is a query the result of which will be a single column. It is enclosed in brackets so that it can become part of the predicate of another query.

Page 45: © Ron Rogerson 1998-2010 Slide 1 Relational Databases Ron Rogerson email Ron@Howard-Rogerson.co.uk.

© Ron Rogerson 1998-2010 Slide 45

SQL - a practical implementation of relational

theory (9) Subqueries (cont’d)

Where its output will be more than 1 row, it must be used with the quantifiers ALL or ANY, (or the comparison operator IN):

SELECT nameFROM student

WHERE registered <= ALL(SELECT DISTINCT registered

FROM student)

Where its output will be exactly one row, it can be used thus:

SELECT countryFROM production

WHERE product = ‘Oats’

AND quantity < (SELECT AVG (quantity)

FROM production

WHERE product = ‘Oats’)

Joins v. subqueries JOIN if output needs data from both tables SUBQUERY if comparison with aggregate

function on 2nd table else can use either n

Page 46: © Ron Rogerson 1998-2010 Slide 1 Relational Databases Ron Rogerson email Ron@Howard-Rogerson.co.uk.

© Ron Rogerson 1998-2010 Slide 46

SQL - a practical implementation of relational

theory (10)

SUBQUERIES (cont'd)

In the logical processing model, a normal subquery is processed first.

Correlated Subqueriesa subquery that refers to the value of a column in the “current row” of the outer query (an “outer reference”).

SELECT country, yr

FROM population p

WHERE population >

(SELECT 0.2*SUM(q.population)

FROM population q

WHERE q.yr = p.yr)

a Correlated Subquery is processed once completely for every row of the “outer” query

Page 47: © Ron Rogerson 1998-2010 Slide 1 Relational Databases Ron Rogerson email Ron@Howard-Rogerson.co.uk.

© Ron Rogerson 1998-2010 Slide 47

SQL - a practical implementation of relational

theory (11)

DATA DEFINITION

CREATE TABLE small_country(name CHAR(16),gdp DECIMAL(4,1),cars INTEGER,population DECIMAL(6,1),PRIMARY KEY (name))

cars INTEGER NOT NULL DEFAULT 0

ALTER TABLE small_countryADD area INTEGER

ALTER TABLE small_countryDELETE population

ALTER TABLE small_countryMODIFY cars DEFAULT 0

ALTER TABLE small_countryALTER cars DROP DEFAULT

DROP TABLE small_country

Page 48: © Ron Rogerson 1998-2010 Slide 1 Relational Databases Ron Rogerson email Ron@Howard-Rogerson.co.uk.

© Ron Rogerson 1998-2010 Slide 48

SQL - a practical implementation of

relational theory (12) Constraints

PRIMARY KEY NOT NULL UNIQUE:

population DECIMAL(6,1) UNIQUE, orALTER TABLE small_country

ADD UNIQUE population REFERENTIAL:

counsellor_no CHAR(4) NOT NULL

REFERENCES staff {staff_no} {ON

DELETE (RESTRICT or SET DEFAULT or CASCADE)} , or

FOREIGN KEY (counsellor_no) REFERENCES

staff {staff_no} {ON DELETE . . } CHECK:

registered SMALLINT CHECK (registered between 1988 and 2010)

CHECK (region = (SELECT region FROM staff WHERE counsellor_no = staff_no))

DOMAIN:CREATE DOMAIN credit_points AS SMALLINT

NOT NULL DEFAULT 60 CHECK (VALUE IN (30, 60))

Page 49: © Ron Rogerson 1998-2010 Slide 1 Relational Databases Ron Rogerson email Ron@Howard-Rogerson.co.uk.

© Ron Rogerson 1998-2010 Slide 49

SQL - a practical implementation of

relational theory (13) VIEWSCREATE VIEW counselling2(s_name, s_no, region,

c_name, c_no) AS

SELECT s.name, student_id, s.region, c.name, counsellor_no

FROM student s, staff c

WHERE counsellor_no=staff_no

DROP VIEW counselling2

UPDATING

DELETE

DELETE FROM small_country WHERE name = "Yugoslavia"

INSERT

INSERT INTO small_country {column_names}

VALUES ('Slovenia', NULL, 157, 4325.1)

(can specify columns to be filled as an alternative to putting NULLs in the VALUES clause)

Page 50: © Ron Rogerson 1998-2010 Slide 1 Relational Databases Ron Rogerson email Ron@Howard-Rogerson.co.uk.

© Ron Rogerson 1998-2010 Slide 50

SQL - a practical implementation of

relational theory (14)UPDATING (cont’d)

UPDATE

UPDATE small_country

SET gdp = 17.3

WHERE name = 'Slovenia' INSERT INTO dba.staff

VALUES ('8086', 'Pratchett', 1)

UPDATING VIEWS

Can be updated, if, in the definition:

SELECT includes only column names (no value expressions) and no DISTINCT operator;

FROM only references one table;

WHERE does not include a subquery;

no GROUP BY and no HAVING;

Page 51: © Ron Rogerson 1998-2010 Slide 1 Relational Databases Ron Rogerson email Ron@Howard-Rogerson.co.uk.

© Ron Rogerson 1998-2010 Slide 51

SQL - a practical implementation of relational

theory (15) Access Control

GRANT SELECT {DELETE, INSERT, UPDATE, REFERENCES} ON staff TO admin, faculty

GRANT ALL PRIVILEGES ON mod_staff TO admin {with grant option}

GRANT UPDATE (name) ON mod_staff TO faculty

Restructuring: planning Main priciple: ensure data is not lost

e.g., CREATE temp. table with old structure

INSERT. . SELECT old data into it

DROP old table

CREATE table, new structure, old name

INSERT . . SELECT old data into it

DROP temporary table or ALTER table to add replacement

column

UPDATE table to copy data from old

column to new

DROP old column

Page 52: © Ron Rogerson 1998-2010 Slide 1 Relational Databases Ron Rogerson email Ron@Howard-Rogerson.co.uk.

© Ron Rogerson 1998-2010 Slide 52

Bringing it together -developing a relational

d/b with SQLSteps:

establishing requirements data analysis database design implementation

Desirable properties of a model completeness integrity flexibility efficiency usability

Modelling constructs entity types relationships attributes

identifiers complex values (separate entities/multiple

attributes) entities or attributes ? derived data

entity subtypes

Page 53: © Ron Rogerson 1998-2010 Slide 1 Relational Databases Ron Rogerson email Ron@Howard-Rogerson.co.uk.

© Ron Rogerson 1998-2010 Slide 53

Bringing it together -developing a relational

d/b with SQL (2)Constraints

inclusive exclusive

Developing a conceptual data model establishing requirements possible ambiguities the model:

“formal representation of what a d/b should contain, independent of how it should be realised”

should: represent all users’ requirements; have no duplication; include all constraints; be general; be understandable.

Data analysis to produce it establish scope of model text analysis:

list nouns as potential entity types discard those which:

are outside scope occur only once are synonyms are attributes relate to implementation details

list verbs as potential relationships re-scan to find constraints

Page 54: © Ron Rogerson 1998-2010 Slide 1 Relational Databases Ron Rogerson email Ron@Howard-Rogerson.co.uk.

© Ron Rogerson 1998-2010 Slide 54

Bringing it together -developing a relational

d/b with SQL (3)

Data analysis (cont'd) Document analysis

assuming a document represents an entity: what does each occurrence represent? what are properties / facts about entity type? for each property, is it

single- or multi- valued? optional or mandatory? derivable? temporal?

Produce initial E-R model add participation, constraints & assumptions eliminate redundancy, resolve m:n, examine

complex data, remove derived data, consider subtypes

check: 'read' model to try and reconsitute the

requirement check requirement to see if data & relationships

are correctly represented

Page 55: © Ron Rogerson 1998-2010 Slide 1 Relational Databases Ron Rogerson email Ron@Howard-Rogerson.co.uk.

© Ron Rogerson 1998-2010 Slide 55

Bringing it together -developing a relational

d/b with SQL (4)Database designThe final system may not be a direct

implementation of the model!Choices in directly representing model:

posted key or relation-for-relationship alternative constraint methods representing complex data representing entity sub-types

in implementation: defining columns

Numeric data types - range/precision Character data types - length Operations required - restricted by number

and type of columns chosen Data/time data types Not null constraints Default values - essential with a “not null”

constraint - meaningful and distinguishable defining keys

surrogate primary keys foreign keys - “on delete” action

omit or relax constraints? de-normalise?

Page 56: © Ron Rogerson 1998-2010 Slide 1 Relational Databases Ron Rogerson email Ron@Howard-Rogerson.co.uk.

© Ron Rogerson 1998-2010 Slide 56

Distributed Data Client / multi-server

designed to store data where it is mostly used, but permit remote access

multiple independent databases user process must explicitly navigate

them (connection management) data may be divided by function, or users,

etc. transaction management done in “two-

phase” commit

Distributed databases meant to store data where it is mostly

used and permit remote access or to provide resilience

appears to user processes as a single d/b (location independent)

distribution schema identifies physical locations of data in the logical schema

data may be fragmented (horizontally or vertically) according to usage, or replicated for resilience

optimisation of queries requires knowledge of where the data items (or nearest copy of them) are available

transaction support (and consistency of multiple copies) an added problem but handled by dbms

Page 57: © Ron Rogerson 1998-2010 Slide 1 Relational Databases Ron Rogerson email Ron@Howard-Rogerson.co.uk.

© Ron Rogerson 1998-2010 Slide 57

Distributed Data (2)

Replication systems Designed to avoid remote accesses by

storing multiple copies locally Improves response and availability but

produces consistency issues May not aim for real-time consistency Consolidation approach - primary sources

for different items are in different places, collect these fragments to produce global view

Dissemination approach - start with single primary copy and distribute copies, but may allow real-time update of central and local copies together

Page 58: © Ron Rogerson 1998-2010 Slide 1 Relational Databases Ron Rogerson email Ron@Howard-Rogerson.co.uk.

© Ron Rogerson 1998-2010 Slide 58

Data Warehouses (NB: do Data Warehouses involve any new kind of

technology, in the same way as distributed d/bs, or row clustering?)

Decision support systems cf. “management information” systems typically ask non-”right now” questions may require data from diverse systems, or

discarded in normal operations (or not otherwise captured?)

Characteristics of data warehouse

Subject-oriented Non-volatile integrated time variant

Dimensional analysis aims to determine subject area of interest,

and important dimensions of analysis. “star schema” element of guesswork in fixing dimensions

(is it a data-centred or application-centred system?)

sales

member

timearea

wine

Page 59: © Ron Rogerson 1998-2010 Slide 1 Relational Databases Ron Rogerson email Ron@Howard-Rogerson.co.uk.

© Ron Rogerson 1998-2010 Slide 59

Data warehouses (2)

Building a data warehouse extraction component

produces warehouse data from existing systems

first define how to identify the “fact” in question

may convert from stored data, or extract as it is added (e.g. by a trigger)

integration component format integration semantic integration

the database fact table is centre of star has n:1 relationships with dimension tables

Aggregates fact table may become enormous so

queries need huge processing power in practice, queries tend to want summary

or aggregate data create aggregate tables at various levels can query one level and drill down of drill

up as necessary to follow up trends discovered

aggregate navigator may help to take advantage of the various levels

Page 60: © Ron Rogerson 1998-2010 Slide 1 Relational Databases Ron Rogerson email Ron@Howard-Rogerson.co.uk.

© Ron Rogerson 1998-2010 Slide 60

XML and databases

XML – a formal markup language for documents

principally for layout & presentation specific XML definitions can be made for

specific applications

XML and relational data

Relational XMLAtomic values in table structure

with unique namesNested elements in tree structure

with named root element

Columns have unique names,ordering not significant,

values all same type

Elements have unique names,can contain data or other

elements, schema can determine type

Rows are distinct, orderingnot significant

Elements distinguished bylocation, specified as a path

Access to data by tableoperations, no concept of

locationAccess to data by location in tree

Atomic values in table structurewith unique names

Nested elements in tree structurewith named root element

Relations are logical structures, no direct storage

implications

XML is logical structure withspecified storage

representation

Page 61: © Ron Rogerson 1998-2010 Slide 1 Relational Databases Ron Rogerson email Ron@Howard-Rogerson.co.uk.

© Ron Rogerson 1998-2010 Slide 61

XML and databases (2)

Transforming relational data into XML

export data by representing table/row structure by XML tags, or

use SQL/XML query to create XML document for specific application

Storing XML data in relational d/b “shred” document by reducing elements

to simple values for table structure – XMLTABLE function, or

store entire XML as CHAR data value

Querying XML values in r/d/b use XMLTABLE function in SQL query to

return table-like values, or use XMLQUERY function to return XML

values

Page 62: © Ron Rogerson 1998-2010 Slide 1 Relational Databases Ron Rogerson email Ron@Howard-Rogerson.co.uk.

© Ron Rogerson 1998-2010 Slide 62

Application Development

Embedded SQL Direct (non-cursor) statements

only where =<1 row will be transferred EXEC SQL

SELECT name, registered, region

INTO :StudentName, :YearRegistered, :Region

FROM student

WHERE student_id= :SelectId; Use also with INSERT, DELETE, etc.

Cursor statements EXEC SQL

DECLARE regional_student CURSOR FOR

{SQL query specification}; EXEC SQL

OPEN regional_student; EXEC SQL

FETCH regional_student

INTO :StudentId, :StudentName, :YearRegistered, :Region;

EXEC SQL

CLOSE regional_student;

Page 63: © Ron Rogerson 1998-2010 Slide 1 Relational Databases Ron Rogerson email Ron@Howard-Rogerson.co.uk.

© Ron Rogerson 1998-2010 Slide 63

Application Development (2)

ODBC Provides interface between applications and

rDBMS Applications can devise own methods of

handling returned data Applications can be DBMS-independent Can handle dynamic SQL, enabling separate

front-end tools to access other vendors' DBMS

Connection to DBMS provided by DBMS-specific ODBC driver

JDBC Provides ODBC-like interface between Java

applications and rDBMS Provides its own automatic cursor-like

method of handling multiple rows Connection to DBMS provided by DBMS-

specific JDBC driver

SQLJ Embedded SQL for Java programs Iterator provides cursor-like functionality

Page 64: © Ron Rogerson 1998-2010 Slide 1 Relational Databases Ron Rogerson email Ron@Howard-Rogerson.co.uk.

© Ron Rogerson 1998-2010 Slide 64

Application Development (3) D/b routines using Java

dbms implementation includes Java virtual machine

internal 'SQL' routine can use Java directly

Object-relational mapping mapping tools require definition to map

each d/b to an application d/b can then be accessed from Java

program without knowledge of SQL or d/b structure

Scripting languages e.g. Python, PERL interpretive, easily changed often used to facilitate browser access to a

d/b requires DBMS-specific DB-API language has to provide own cursor-like

functionality