CIS 671Query Languages1 CIS 671 Introduction to Database Systems II Introduction S. Parthasarathy.
-
Upload
lucy-thornton -
Category
Documents
-
view
229 -
download
1
Transcript of CIS 671Query Languages1 CIS 671 Introduction to Database Systems II Introduction S. Parthasarathy.
CIS 671 Query Languages 1
CIS 671Introduction to Database Systems II
Introduction
S. Parthasarathy
CIS 671 Query Languages 2
CIS 671 Course Outline• Review of basic database material (CIS
670)• Object-Oriented and Object Relational
Database Systems• Data Warehousing and Data Mining• Active Databases• Introduction to Storage Structures• Emerging Technologies
CIS 671 Query Languages 3
RelationsRelation schema R or R(A1, A2, …, An)
•fixed set of attributes A1, A2, …, An
•each attribute Aj corresponds to exactly one of the underlying domains Dj (j = 1, 2, …, n), not necessarily distinct.
Example: EMP( EMPNO, NAME, DNO, JOB, MGR, SAL, COMMISSION)note: EMPNO and MGR take values from the same domain, the set of valid employee numbers
N, the number of attributes in the relation scheme is called the degree of the relation.Degree 1 unary degree 3 ternarydegree 2 binary degree n n-ary
CIS 671 Query Languages 4
Keys• A relation is a set of tuples.
• All elements of a set are distinct.
• Hence all tuples must be distinct.
• There may also be a subset of the attributes with the property that values must be distinct. Such a set is called a superkey.
• SK a set of attributes
• t1 and t2 tuples
• t1 [SK] t2 [SK]
Guaranteed by the “real world”.
CIS 671 Query Languages 5
Candidate Key -a minimal superkey
• A set of attributes, CK is a superkey,
• but no proper subset is a superkey.Example
EMPNO
NAME ???
Primary Key - one arbitrarily chosen candidate key.
ExampleEMPNO
CIS 671 Query Languages 6
Theoretical Properties of Relations
• Tuples are unordered.
• There are no tuple duplicates.
• Attributes are unordered
• All attribute values are atomic.
• Implementations vary in terms of how closely they follow the above properties.
CIS 671 Query Languages 7
Relational Database Schema S
• Set of Relation Schemas S = {R1, R2, …, Rp}
• Set of integrity constraints, IC– Integrity constraints will be discussed next.
CIS 671 Query Languages 8
Relational Database (Instance) DB of Schema S
• Set of relation instances, DB
• DB = {r1, r2, …, rp}
• Each rk is an instance of relation RK
• The rk’s satisfy the integrity constraints, IC
CIS 671 Query Languages 9
Database Example - SchemaEMP( EMPNO, NAME, DNO, JOB, MGR, SAL, COMMISSION)
EMPNO from domain EmployeeNumbers...Integrity constraints to be added
DEPT( DNO, DNAME, LOC)DNO from domain DepartmentNumberIntegrity constraints to be added
CIS 671 Query Languages 10
Database Example- Database (instance)
EMP( EMPNO, NAME, DNO, JOB, MGR, SAL, COMMISSION)1234 Smith D2 J1 4567 30K 102222 Jones D2 J2 4567 25K4567 Blue D1 J4 9999 40K9999 Green D3 J5 100K
DEPT( DNO, DNAME, LOC)
D1 xxx yyyD2 aaa yyyD3 bbb zzz
CIS 671 Query Languages 11
Relational Integrity Constraints
• Not included in Codd’s original definition of relations.
• Not supported in initial commercial relational database systems.
• Now supported in all major relational database systems.
CIS 671 Query Languages 12
Integrity Rules:Applicable to All Databases
• Key constraint
• Entity integrity
• Referential integrity
CIS 671 Query Languages 13
The Three Integrity Rules
• Key constraintCandidate keys are specified for each relation
schema.
Candidate key values must be unique for every tuple in any relation instance of that schema.
CIS 671 Query Languages 14
The Three Integrity Rules, Continued
• Entity integrity:– No attribute participating in the primary key is
allowed to accept null values.
Justification: we must be able to identify each tuple uniquely.
CIS 671 Query Languages 15
Foreign Keys EMP( EMPNO, NAME, DNO, JOB, MGR, SAL, COMMISSION)
DEPT( DNO, DNAME, LOC)
DNO in EMP - should be allowed only if that DNO appears as primary key in relation DEPT.
MGR in EMP - should be allowed only if that MGR appears as primary key in relation EMP.
The attributes DNO and MGR called foreign keys in relation EMP.
CIS 671 Query Languages 16
The Three Integrity Rules, Continued
• Referential integrity:– If relation schema R1 includes a foreign key, FK,
matching the primary key of relation schema R2 , then a value of FK in a tuple t1 of R1 must either
• Be equal to the value of PK in some tuple t2 of R2 ,
i.e. T1 [FK] = t2 [PK], or• Be wholly null (i.e. Each attribute value participating in
that FK value must be null).
Note: R1 and R2 need not be distinct.
Justification: if some tuple t1 references some tuple t2 , then tuple t2 must exist.
CIS 671 Query Languages 17
Implications of the Three Integrity Rules
• System could reject the operation.
• System could accept the operation, but perform some additional operations so as to reach a new legal state.
What should happen if an operation on the database is about to cause the violation of one of the integrity rules? i.e. about to put the database into an “illegal” state.
CIS 671 Query Languages 18
Query Languages Review
Srinivasan Parthasarathy
CIS 671 Query Languages 19
SQL - Parts of the Language
• Data Definition Language (DDL)– create table– create index
• Data Manipulation Language (DML)– select (retrieve)– update– insert– delete
CIS 671 Query Languages 20
Select - Basic Form(select from where)
Cartesian product followed by select and project.select project-listfrom Cartesian-product-listwhere select-condition(s)
Abstract example: Given tables R(A,B) and S(B,C)select R.A, R.B, S.Cfrom R, Swhere R.A > 10
BUT - BUT - DuplicatesDuplicates NOT eliminated. Bag vs. Set. NOT eliminated. Bag vs. Set.
CIS 671 Query Languages 21
EMP( EMPNO, … , DNO, JOB, . . .)100 D3 electrician200 D3 plumber300 D3 electrician400 D1 electrician500 D1 plumber600 D1 carpenter700 D2 electrician800 D2 carpenter900 D2 electrician
CASE STUDY EXAMPLE
CIS 671 Query Languages 22
Select as a JOINCartesian product followed by
select (“join” & “select” conditions) and project.select project-listfrom Cartesian-product-listwhere join-conditionand select-condition
Abstract example: Given tables R(A,B) and S(B,C)select R.A, R.B, S.Cfrom R, Swhere R.B = S.B /* join condition */ and R.A > 10 /* “select” condition */
How is this related to these relational operators? Select (), Project (), Join: Natural (*)
CIS 671 Query Languages 23
Using EMP and DEPTFrom Relational Algebra to SQL
• List the names, employee numbers, department numbers
and locations for all clerks.select NAME, EMPNO, E.DNO, LOC from EMP E, DEPT Dwhere E.DNO = D.DNO /* join condition */and JOB = ‘Clerk’ /* “select” condition */
.
EMP( EMPNO, NAME, DNO, JOB, MGR, SAL, COMMISSION)DEPT( DNO, DNAME, LOC)
CIS 671 Query Languages 24
• Duplicates in project - must use explicit distinctList the different department numbers in the EMP
table (eliminate duplicates).select distinct DNOfrom EMP
• Specify sort orderList employee number, name, and salary of employees
in department 50.select EMPNO, NAME, SALfrom EMPwhere DNO = 50order by EMPNO
CIS 671 Query Languages 25
• UnionList the numbers of those departments which
have an employee named ‘Smith’ or are located in ‘Columbus’.select DNOfrom EMPwhere ENAME = ‘Smith’ unionselect DNOfrom DEPTwhere LOC = ‘Columbus’
DuplicatesDuplicates ARE eliminated ARE eliminated by default.by default.
union union allall - leaves duplicates - leaves duplicates
CIS 671 Query Languages 26
Functions and Groups
• List the departments (DNO) and the average salary of each.
select DNO, avg(SAL)from EMP E, DEPT Dwhere E.DNO = D.DNOgroup by DNO
• List the departments (DNO, DNAME) in which the average employee salary < $25,000.
select DNO, DNAMEfrom EMP E, DEPT Dwhere E.DNO = D.DNOgroup by DNO, DNAMEhaving avg(SAL) < 25000
CIS 671 Query Languages 27
Nested Select: No analog in Relational Algebra • List names of employees in departments 25,
47 and 53.select NAMEfrom EMPwhere DNO in (25, 47, 53)
• List names of employees who work in departments in Ann Arbor.
select NAMEfrom EMPwhere DNO in ( select DNO
from DEPTwhere LOC =
‘Ann Arbor’ )
CIS 671 Query Languages 28
Null Values
All of the following conditions are always false.null > 25 null < 25 null = 25 null <> 25null >= 25 null <= 25 null = null null <> null
However we can use the following:select NAMEfrom EMPwhere SAL < 35000 or SAL is null
CIS 671 Query Languages 29
select DNOfrom DEPTwhere exists
( select *from EMP ED3where ED3.DNO = ‘D3’ and exists
( select *from EMP EYwhere EY.JOB = ED3.JOB
and EY.DNO = DEPT.DNO))
exists,
Q. Find the numbers of those departments that have employees who can do some job that is done by an employee in department D3. Answer: D1, D2, D3
The order of the two “selects” does not matter.
Ok, lets see why this works
Since there is an external reference to DEPT within the secondNested select we will execute the nested select for each DEPT Tuple. Assume the DEPT table contains only three tuples corresponding to D1, D2 and D3 in that order.
The first tuple we will evaluate is the DEPT.DNO = D1
The first nested select will simply highlight the tuples indepartment D3. The second nested select will point to tuples related to the tuplecurrently pointed to within the department table.
ED3.EMPNO, ED3.DNO, ED3.JOB,..)
100 D3 electrician
200 D3 plumber
300 D3 electrician
400 D1 electrician
500 D1 plumber
600 D1 carpenter
700 D2 electrician
800 D2 carpenter
900 D2 electrician
EY.EMPNO, EY.DNO, EY.JOB,..)
100 D3 electrician
200 D3 plumber
300 D3 electrician
400 D1 electrician
500 D1 plumber
600 D1 carpenter
700 D2 electrician
800 D2 carpenter
900 D2 electrician
select DNOfrom DEPTwhere exists
( select *from EMP ED3where ED3.DNO = ‘D3’ and exists
( select *from EMP EYwhere EY.JOB = ED3.JOB
and EY.DNO = DEPT.DNO))
CIS 671 Query Languages 32
for all,
EMP( EMPNO, … , DNO, JOB, . . .)100 D3 electrician200 D3 plumber300 D3 electrician400 D1 electrician500 D1 plumber600 D1 carpenter700 D2 electrician800 D2 carpenter900 D2 electrician
Q. Find the numbers of those departments that have employees who can do all the jobs that are done by an employee in department D3. Answer: D1, but not D2
CIS 671 Query Languages 33
select DNOfrom DEPTwhere for all
( select *from EMP ED3where ED3.DNO = ‘D3’ and exists
( select *from EMP EYwhere EY.JOB = ED3.JOB
and EY.DNO = DEPT.DNO))
Q. Find the numbers of those departments that have employees who can do all the jobs that are done by an employee in department D3. Answer: D1,D3, but not D2
However no for all exists in SQL.
CIS 671 Query Languages 34
select DNOfrom DEPTwhere not exists
( select *from EMP ED3where ED3.DNO = ‘D3’ and not exists
( select *from EMP EYwhere EY.JOB = ED3.JOB
and EY.DNO = DEPT.DNO))
Q. Find the numbers of those departments that have employees who can do all the jobs that are done by an employee in department D3. Answer: D1,D3, but not D2
However no for all exists in SQL.Use two not exists. (see page 207 fora good list of mathematical logicoperations/tricks)
Ok, lets see why this works
Since there is an external reference to DEPT within the secondNested select we will execute the nested select for each DEPT Tuple. Assume the DEPT table contains only three tuples corresponding to D1, D2 and D3 in that order.
The first tuple we will evaluate is the DEPT.DNO = D1
The first nested select will simply highlight the tuples indepartment D3. The second nested select will point to tuples related to the tuplecurrently pointed to within the department table.
ED3.EMPNO, ED3.DNO, ED3.JOB,..)
100 D3 electrician
200 D3 plumber
300 D3 electrician
400 D1 electrician
500 D1 plumber
600 D1 carpenter
700 D2 electrician
800 D2 carpenter
900 D2 electrician
EY.EMPNO, EY.DNO, EY.JOB,..)
100 D3 electrician
200 D3 plumber
300 D3 electrician
400 D1 electrician
500 D1 plumber
600 D1 carpenter
700 D2 electrician
800 D2 carpenter
900 D2 electrician
select DNOfrom DEPTwhere not exists
( select *from EMP ED3where ED3.DNO = ‘D3’ and not exists
( select *from EMP EYwhere EY.JOB = ED3.JOB
and EY.DNO = DEPT.DNO))
CIS 671 Query Languages 37
select DNOfrom DEPTwhere not exists
( select *from EMP ED3where ED3.DNO = ‘D3’ and not exists
( select *from EMP EYwhere EY.JOB = ED3.JOB
and EY.DNO = DEPT.DNO) ) and DNO <> ‘D3’
Q12. Find the numbers of those departments that have employees who can do all the jobs that are done by an employee in department D3. Answer: D1, but not D2
Eliminate department D3.
CIS 671 Query Languages 38
SQL:1999 (SQL 3) Recursive Closure
Q 16. List all the superiors of EMPNO 500.
600, 950, 980
Q 17. List all those supervised by EMPNO 900.
700, 800, 100, 200, 300,
400
How to express these queries?
980
900 950
700 800 600 850
100 200 300 400 500
CIS 671 Query Languages 39
with recursiveSUPERIORS(EMPNO, MGR) as
(select EMPNO, MGR from EMP where EMPNO = 500
union all select SUPERIORS.EMPNO, EMP.MGR from SUPERIORS, EMP where EMP.EMPNO = SUPERIOR.MGR)
select MGRfrom SUPERIORS
Q 16. Given EMP (EMPNO, MGR, ...),list all the superiors of EMPNO 500.
Generate SUPERIORS (EMPNO, MGR)
Just the superiors 600of 500. 950
980
Initial table
The recursion
CIS 671 Query Languages 40
Q 16. List all the superiors of EMPNO 500. SUPERIORS (EMPNO, MGR)
Initial table
Second addition
First addition
EMP( EMPNO, MGR,...)100 700 200 700300 800400 800500 600600 950700 900800 900850 950900 980950 980980 980
500 600
500 950
500 980
CIS 671 Query Languages 41
Entity-Relationship (ER) Model(Peter P.-S. Chen)
Review
CIS 671S. Parthasarathy
CIS 671 Query Languages 42
Helpful for conceptualizing the Real World• Entity: a thing that exists
– e.g. person, automobile, department, employee
• Entity Set: a group of similar entities– e.g. all persons, all automobiles, all employees
• Relationship: association between entities– e.g. a person is assigned to a department
• Relationship Set : set of similar relationships
• Attribute: property of an entity or relationship– e.g. person - name, address
• Domain: set of values allowed for an attribute
CIS 671 Query Languages 43
ExampleEmployees E#, ENAME, ADDRESSDepartments D#, DNAMEProjects P#, PNAME
Constraints (cardinality)1. Employees may be assigned to only 1 department at a time.2. Employees may be assigned to several projects at once,
each with an associated %time.
Constraints (participation)3. Employees must be assigned to a department.4. Employees need not be assigned to any projects.
CIS 671 Query Languages 44
Complete Picture
Project
AssignedTo
Employee
Department
IsIn
%TIME
N
1
M
N
D#
DNAME
P#
E#
ENAME ADDRESS
PNAME
CIS 671 Query Languages 45
Example: Relationships and Attributes
ProjectEmployee AssignedTo
Employee DepartmentIsIn
%TIME%TIME
N 1
MN
2.
1.
2. Employees may be assigned to several projects at once, each with an associated %time.
1. Employees may be assigned to only 1 department at a time.
CIS 671 Query Languages 46
Example: Relationships and Attributes
ProjectEmployee AssignedTo
Employee DepartmentIsIn
%TIME%TIME
N 1
MN
4.Partial
3. Total
3. Employees must be assigned to a department.
• Total: Each entity must be included at least once in the relationship.
4. Employees need not be assigned to any projects.
• Partial: Each entity instance need not be included at least once in the relationship.
CIS 671 Query Languages 47
Entity-Relationship Enhancements: Attributes
1. Simple (atomic) vs. Composite Attributes– Simple
– Composite
Employee Employee
Employee
LName
FName
Name
E#
LName
FName
CIS 671 Query Languages 48
2. Single-valued vs. Multi-valued Attributes– Multi-valued
– Multi-valued as Entity
Entity-Relationship Enhancements: Attributes
Student
Major_Program
Major
Major
StudentHas
MajorN M
CIS 671 Query Languages 49
Entity-Relationship Enhancements: Attributes
3. Derived Attributes - Include in Department the average salary of the employees in the department.
Employee Department
Salary AvgSal
MemberOf
CIS 671 Query Languages 50
Name
Entity-Relationship Enhancements: Entities
• Weak Entity Type, Identifying Relationship Type, Partial Key
• E.g. Represent all the dependents of each employee given his or her name. No ID number. Problem: Names are not unique across employees.
Employee DependentsOf
Dependent
NameSSN
1 N
CIS 671 Query Languages 51
Complete Picture
Project
AssignedTo
Employee
Department
IsIn
%TIME
N
1
M
N
D#
DNAME
P#
E#
ENAME ADDRESS
PNAME
CIS 671 Query Languages 52
As Relations
• Entities– Department(D#, DNAME)– Employee(E#, ENAME, ADDRESS)– Project(P#, PNAME)
• Relationships– Is_In(E#, D#) 1:N– Assigned_To(E#, P#, %TIME) N:M
CIS 671 Query Languages 53
As Relations: Replacing Employee and Is_In with Employee’
• Entities– Department(D#, DNAME)– Employee’(E#, ENAME, ADDRESS, D#)– Project(P#, PNAME)
• Relationships– Assigned_To(E#, P#, %TIME) N:M
– [Is_In(E#, D#)]
CIS 671 Query Languages 54
Enhanced-ER (EER) Model
• Subclasses & Superclasses– Specialization & Generalization– Type Inheritance
• Categories
CIS 671 Query Languages 55
• Superclasses & Subclasses• Generalization & Specialization• Inheritance
Example: University Database
Person
Employee
Student
Grad Student
Undergrad Student
SSN
Salary
Name
Class
EmpID
DegreeProgram
MajorDept
d
o
U
UU
U
d disjoint
o overlap
total
partial
Usuperclass subclass
Superclass instance must always exist.
CIS 671 Query Languages 56
Example: Meeting Locations
MeetingLocationPlace
Park
Organization
Date
Day*
MeetsAtCapacity
RoomNumber*
Name*Street
Address
Time*
OrganizationName*
NorthSouthCoordinates
ParkName*
EaseWestCoordinates
Building
IsIn
Room
U U
U union• Categories