An introduction to relational database management system€¦ · Types of DBMS Data model...
Transcript of An introduction to relational database management system€¦ · Types of DBMS Data model...
An introduction to relational databasemanagement system
Fabrice Rossi
CEREMADEUniversité Paris Dauphine
2019
Database Management Systems
Database (DB)An organized collection of data
Database Management System (DBMS)A software use to manage databases, namelyI creating a databaseI maintaining a databaseI interacting with a database
2
DBMS versus Files
File-Based data managementI data in CSV files (or other format)I management with e.g. Python + Pandas (or R + tidyverse)
Advantages of DBMSI meta data managementI integrity constraintsI security (access control)I concurrency controlI etc.
3
DBMS versus Files
File-Based data managementI data in CSV files (or other format)I management with e.g. Python + Pandas (or R + tidyverse)
Advantages of DBMSI meta data managementI integrity constraintsI security (access control)I concurrency controlI etc.
3
DBMS components
DBMS architectureI storage manager
I handles saving data and meta data on the storage systemI manages part of the concurrency aspects
I query processorI core part of the DBMSI handles computation on the database (e.g. search, insertion,
deletion, modification, aggregation)I optimize queries for efficiency
I high level toolsI meta data managementI connection and security manager
4
DBMS components
DBMS langaguesI data definition language (DDL)
I used to specify a data modelI data typesI links between data
I meta data managementI data manipulation language (DML)
I used to query a databaseI specifies computation on a database
5
Types of DBMS
Data model categorizationI old approaches: hierarchical DBMS and network DBMSI current standard: relational DBMS
I based on a relational data modelI typical based on SQL (Structured Query Language, sequel)
I niche approach: object-oriented DBMSI large scale or unstructured data: NoSQL
I not-only SQL (or more specifically not relational)I document storesI graph oriented databasesI column oriented databases
6
In this course
An introduction to Relational DBMS
What’s missing?I A lot!I DBMS are a very rich and complex subject
I numerous theoretical aspectsI getting under the hood is yet another subjectI practical aspects are also very complex
7
In this course
An introduction to Relational DBMS
What’s missing?I A lot!I DBMS are a very rich and complex subject
I numerous theoretical aspectsI getting under the hood is yet another subjectI practical aspects are also very complex
7
Outline
Introduction
Conceptual Data Modeling
The Relational Model
SQL
8
Outline
Introduction
Conceptual Data Modeling
The Relational Model
SQL
9
Data Management
Specifying the dataI ideally the first step of a data oriented projectI important questions
I what are the objects under study?I how are they described?I how are they related?
Two steps process
1. conceptual/formal model2. implementation in a meta data oriented language
10
Entity Relationship Model
Proposed by Peter Chen in 1976
ConceptsI Entity: uniquely identified object under study (e.g. a person)I Relationship: a way to relate entities (e.g. a has access to b)I Attribute: a property of an entity or of a relationship. An attribute
has a domain (the set of values it can take)I an ER model describes types, e.g. entity type, not values
11
Example
Loan application data setI https://relational.fit.cvut.cz/dataset/Financial
I 8 tables includingI client tableI account tableI credit card tableI loan tableI etc.
12
Example
(part of the) Client tableclient_id gender birth_date
1 F 1970-12-132 M 1945-02-043 F 1940-10-094 M 1956-12-015 F 1960-07-03
ER modelI entity type: clientI attributes
I gender with domain F andM
I birth_date with a datadomain
I client_id
key attribute
13
Example
(part of the) Client tableclient_id gender birth_date
1 F 1970-12-132 M 1945-02-043 F 1940-10-094 M 1956-12-015 F 1960-07-03
ER modelI entity type: clientI attributes
I gender with domain F andM
I birth_date with a datadomain
I client_id
key attribute
Client
gender birth_date client_id
13
Example
(part of the) Client tableclient_id gender birth_date
1 F 1970-12-132 M 1945-02-043 F 1940-10-094 M 1956-12-015 F 1960-07-03
ER modelI entity type: clientI attributes
I gender with domain F andM
I birth_date with a datadomain
I client_id key attribute
Client
gender birth_date client_id
13
Graphical representation
Entity typeEntity types are represented by rectangles link to attributes
Attribute type
I an ellipse per typeI key attribute type
I a unique identifier of thecorresponding entity
I underlined in therepresentation
I composite attribute typeI can be decomposed into
sub-attributesI linked ellipses
I derived attribute typeI can be computed from
another one (e.g. agefrom birth date)
I dashed borderI multi-valued attribute type
I several values areauthorized
I double border
14
Example
Person
gender birth_date age client_id
name
First name
Last name
15
Relationship
PrincipleI a relationship represents an association between at least two
entitiesI it can have attributesI it is characterized by cardinalities
I minimum and maximum number of relationships to which a givenentity can participate
I asymmetricI a relationship type is graphical represented by a rhombus
16
Example
Loan application data setI client entities and account entitiesI relationships
I a client can be the owner of an accountI a client can be allowed to use an account
I cardinalitiesI owner:
I each account has exactly one ownerI each client can own at most one account (in this database)
I user:I each account may have some usersI each client can be the user of some accounts
17
Example
Clientclient_id
Owns
Account account_id
0-1 1-1
Uses
0-n 0-m
18
Outline
Introduction
Conceptual Data Modeling
The Relational Model
SQL
19
Relational Model
HistoryI invented by Edgar F. Codd in 1969-1970I based on a mathematical model, the relational algebraI associated relational calculus
Main conceptsI a domain: a setI a relation: a (finite) subset of
∏ni=1(Di ∪ {NULL}) where each Di
is a domain and NULL a special value (for missing values)I a tuple: an element of a relationI a database: a collection of relations
20
Examples
ClientsI Domains:
I N+: positive integersI G = {F ,M}: gendersI D: dates
I A client relation:I a subset of N+ ×G × DI a set of tuples such as
(1,F , 1970−12−13)
AccountsI Domains:
I N+: positive integersI D: dates
I An account relation:I a subset of N+ × N+ × DI a set of tuples such as
(1, 8, 1995−03−24)
21
Relation type/variable
Specifying relationsI somewhat imprecise vocabularyI a relation (value) r ∈
∏ni=1(Di ∪ {NULL})
I a relation type/variable (relvar)I a named specification of the relationI domains and names: column types
NotationI a relation type/variable is denoted R(A1, . . . ,An):
I R: name of the relation typeI Ak : column type names
I e.g. client(client_id, gender, birth_date)I and account(account_id, district_id, date)
22
ER and relational model
CorrespondenceI Entity type: relation type/variable, relvarI Entity: tupleI Attribute type: column type (frequently called attribute type)I Attribute: value in a tuple
More about ER→ relational later
23
Keys
Superkeys and keysI a relation is a set⇒ each tuple is uniqueI a superkey
I any subset of the attribute types such that no tuples of the relationshare exactly the same values for these attribute types
I default superkey: all the attribute types!I a key: a minimal superkey (removing attribute types removes it
superkey status)
France data setI departement(region_id,departement_id,name)I superkeys
I (region_id,departement_id,name)I (departement_id,name)I (departement_id): a key!I (name): another key!
24
Primary and foreign keys
Primary keyI several keys: candidate keysI one of the is the primary keyI other are alternative keys
Foreign keysI a set of attribute types FK in a relation type R1 is a foreign key of
R1 if1. a candidate key in a relation R2 has exactly the same domains as
the ones of the attribute types in FK2. values on FK in a tuple in r1 are either NULL or occur in a tuple in r2
I in simple terms: a foreign key links R1 to R2
25
Example
RelvarI client(client_id, gender, birth_date)I account(account_id, district_id, date)I disp(disp_id, client_id, account_id, type) with
type∈ {OWNER,DISPONENT}
Many keysI bold and underlined: primary keysI underlined: foreign keysI district_id is also a foreign key but for another relationI in this database, the relation between a client and an account is
unique and thus (client_id, account_id) is candidate key
26
Integrity Constraints
Relational constraintsWhen implemented the relational model must enforce someconstraintsI Domain integrity: the values in a tuple are unique values from the
associated domain or NULL (if allowed, i.e. if the attribute type isnullable)
I Key constraint: every relation has a primary keyI Entity integrity: values in the primary key cannot be NULLI Referential integrity: foreign keys can be either NULL or must refer
to an existing tuple
27
ER to relational
MappingI ER models are abstractI must be mapped to a concrete database modelI the relational model is close enough to ER to enable a simple
mapping strategy
PrinciplesI Entity type→ relation (type)I Simple attribute type→ column typeI Key attribute type→ primary or alternative keyI All the rest (relationship types and complex attribute types)→
relation (type) and keys
28
Mapping Entity Types
With simple attribute typesI direct mappingI an entity type is mapped to a relation typeI each attribute type corresponds to a column typeI a key is mapped to a primary or alternative keyI composite attribute types are mapped to a set of column types
ER model
Client
gender birth_date client_id
Relation typeClient(gender, birth_date,client_id)
29
Multi-valued attribute types
MethodI a multi-valued attribute type cannot be mapped to a column type
because of the domain integrity constraintI representation via a relation type
I a relation type per multi-valued attribute typeI a column type for the attributeI a foreign key to map back the attribute to the entity
ER model
Client
gender email client_id
Relation typesI Client(gender, client_id)I ClientEmail(email, client_id)
30
Mapping relationship types
PrinciplesI the mapping depends on the cardinalities of the relationship typeI 0-1 cardinalities on a least one side foreign keyI other cases: foreign keys or specific relational type
One to one (1:1) relationship typeI each side is 0-1 or 1-1I mapped to a foreign key:
I in the 1-1 relation type if it exists with a non nullable column typeI nullable if both side are 0-1
31
Example
Clientclient_id Owns Account account_id0-1 1-1
Relation typesI Client(client_id)I Account(account_id,client_id)I client_id is non nullable in
Account
Inferior alternativeI Client(client_id,account_id)I account_id is nullable in ClientI Account(account_id)
CardinalitiesI Owns↔ Account is enforcedI but an account can be shared by
several clientsI Client↔ Owns is 0−m
CardinalitiesI Client↔ Owns is enforcedI but an account can be assigned
to any number of clientsI Owns↔ Account is 0−m
32
Example
Clientclient_id Owns Account account_id0-1 1-1
Relation typesI Client(client_id)I Account(account_id,client_id)I client_id is non nullable in
Account
Inferior alternativeI Client(client_id,account_id)I account_id is nullable in ClientI Account(account_id)
CardinalitiesI Owns↔ Account is enforcedI but an account can be shared by
several clientsI Client↔ Owns is 0−m
CardinalitiesI Client↔ Owns is enforcedI but an account can be assigned
to any number of clientsI Owns↔ Account is 0−m
32
Higher cardinalities
One to N (1:N) relationship typeI mapped to a foreign key in the 1 side entity relation typeI nullable if this side is 0− 1, non nullable if it is 1− 1I the minimal cardinality on the N side cannot be enforced
Example
Region
region_id
Contains Department
department_id
1-N 1-1
I Department(departement_id,region_id)I Region(region_id)
33
Higher cardinalities
M to N (M:N) relationship typeI mapped to a relation typeI two foreign keys, one for each tableI the primary key of the relation type is the combination of the
foreign keysI minimal cardinalities cannot be enforced
34
Example
Clientclient_id Uses Account account_id0-n 1-m
I Client(client_id)I Account(account_id)I Uses(client_id, account_id) both non nullable
35
Outline
Introduction
Conceptual Data Modeling
The Relational Model
SQL
36
SQL
What is SQL?I SQL is relational data management languageI Structured Query Language pronounced sequelI developed initially by IBM as an "implementation" of the relational
modelI SQL is a standard since 1986 (numerous versions)
ImplementationsI SQL is “supported” by all relational database management
systemsI many open source solutions (MySQL/MariaDB, PostgreSQL,
SQLite, etc.)I but many variations in the support level (portability is not
guaranteed)
37
SQL Components
Multiple aspectsI Data Definition/Description Language
I relational model descriptionI domain definition
I Data Manipulation Language: insertion, suppression andmodification
I Data Query LanguageI read only manipulationI selection, filtering, grouping, etc.
I Data Control LanguageI access control to the databasesI users, roles, permissions, etc.
38
SQL Data Description Language
RelationI a relation (type) is created in SQL by
CREATE TABLE relation_name (column_name domain, ...);
I SQL supports numerous default domains (implementationdependent!):I INT: integerI DATETIME, DATE and TIME: date and timeI BOOLEAN: true or falseI CHAR(n) and VARCHAR(n): string with maximum size n
I constraints can be specified after the domain:I NOT NULL: non nullableI PRIMARY KEY: self explanatoryI UNIQUE: candidate keyI DEFAULT: used to specify a default value
39
Example
Relational model to SQLClient(client_id, first_name, last_name, birth_date)CREATE TABLE Client (
client_id INT PRIMARY KEY,first_name VARCHAR(30) NOT NULL,last_name VARCHAR(30) NOT NULL,birth_date DATETIME NOT NULL
);
40
Domains
SQL domainsI domains can be created in SQLI typical form
CREATE DOMAIN Gender AS VARCHAR(6)CHECK (VALUE IN ('Female','Male'));
I unsupported in many implementations (e.g. MySQL)
Constraints based versionI constraints can be added to the table creationI CHECK can be used to implemented domainsI less elegant (no centralized definition)
41
Example
Relational model to SQLClient(client_id, gender, first_name, last_name, birth_date)CREATE TABLE Client (
client_id INT PRIMARY KEY,gender AS VARCHAR(6),CONSTRAINT gc CHECK(gender IN ('Female','Male')),first_name VARCHAR(30) NOT NULL,last_name VARCHAR(30) NOT NULL,birth_date DATETIME NOT NULL
);
42
Keys
Primary keysI primary keys are not mandatory in SQLI but they should be specified!I UNIQUE is useful as a constraintI a primary key can be made with several columns using
PRIMARY KEY (COL1, COL2, ...)
in the table creation
Foreign keysI declared as FOREIGN KEY (column) during table creationI together with a REFERENCES table(column)I a foreign key can be a set of columns
43
Example
Relational model to SQLI Client(client_id)I Account(account_id,client_id)
CREATE TABLE Client(client_id int PRIMARY KEY);CREATE TABLE Account(account_id int PRIMARY KEY,
client_id int NOT NULL,FOREIGN KEY(client_id) REFERENCES Client(client_id));
44
Referential integrity
Foreign keysI must reference an existing primary keyI SQL allows one to handle consequences of tuple modifications
I what happens if the primary key of a tuple is modified?I what happens if a tuple is deleted?
I ON DELETE something and ON UPDATE somethingI with something being
I CASCADE: propagate the modification to referring tuplesI RESTRICT: forbid the modification if there are referring tuplesI SET NULL or SET DEFAULT: modify the foreign key in the referring
tuples as described
45
Example
Relational model to SQLI Client(client_id)I Account(account_id,client_id)
CREATE TABLE Client(client_id int PRIMARY KEY);CREATE TABLE Account(account_id int PRIMARY KEY,
client_id int NOT NULL,FOREIGN KEY(client_id) REFERENCES Client(client_id))ON DELETE RESTRICT ON UPDATE CASCADE;
46
Additional commands
Modifying the modelI DROP TABLE name: delete a tableI ALTER TABLE ...:
I data model modificationI long list of possibilities
I adding/removing a columnI changing the properties of a column (domain, constraints, etc.)I etc.
47
SQL Data Query Language
The SELECT commandI the main query command in SQLI general form
SELECT something FROM somewhere[WHERE conditions] [GROUPBY grouping][HAVING group conditions] [ORDER BY something]
I provides all the manipulations available in R+dyplr orPython+pandas:I subsetting, filtering, transforming, orderingI summarizingI joining
48
Subsetting
SubsettingI simple SELECT queries can be used to subset a relation on
interesting columnsI general form SELECT col1, ..., col2 FROM table;
ExempleSELECT client_id, gender FROM Client;
Client relationclient_id gender birth_date district_id
1 F 1970-12-13 182 M 1945-02-04 13 F 1940-10-09 14 M 1956-12-01 55 F 1960-07-03 5
Result relationclient_id gender
1 F2 M3 F4 M5 F
49
Transformation
Expression and renamingI columns may be renamed using orig_name AS new_name in
the SELECT commandI simple calculations may also be performed on columns including
the results as new columns
ExampleSELECT A2 AS Name, A4 AS Population,
A5+A6+A7+A8 AS Municipalities FROM District;
District relationdistrict_id A2 A3 A4 A5 A6 A7 A8
1.00 Hl.m. Praha Prague 1204953.00 0.00 0.00 0.00 1.002.00 Benesov central Bohemia 88884.00 80.00 26.00 6.00 2.003.00 Beroun central Bohemia 75232.00 55.00 26.00 4.00 1.004.00 Kladno central Bohemia 149893.00 63.00 29.00 6.00 2.005.00 Kolin central Bohemia 95616.00 65.00 30.00 4.00 1.00
50
Transformation
Expression and renamingI columns may be renamed using orig_name AS new_name in
the SELECT commandI simple calculations may also be performed on columns including
the results as new columns
ExampleSELECT A2 AS Name, A4 AS Population,
A5+A6+A7+A8 AS Municipalities FROM District;
Result relationName Population MunicipalitiesHl.m. Praha 1204953.00 1.00Benesov 88884.00 114.00Beroun 75232.00 86.00Kladno 149893.00 100.00Kolin 95616.00 100.00
50
Filtering
Selecting tuplesI the WHERE clause can be used to select tuples fulfilling some
conditionsI general form
SELECT columns FROM table WHERE conditions;
ExempleSELECT client_id, birth_date FROM Client
WHERE Gender='F';
Client relationclient_id gender birth_date district_id
1 F 1970-12-13 182 M 1945-02-04 13 F 1940-10-09 14 M 1956-12-01 55 F 1960-07-03 5
Result relationclient_id birth_date
1 1970-12-133 1940-10-095 1960-07-038 1938-02-21
11 1950-08-22
51
Aggregation
Global summariesI aggregation functions can be used in the result part of the
SELECT commandI they operate at the column levelI some examples:
I COUNT and COUNT(DISTINCT(.))I MAX, MIN, SUMI AVG, STD, VARIANCE
Financial databaseSELECT COUNT(birth_date) FROM Client WHERE Gender='F';
gives 2645, whileSELECT COUNT(DISTINCT(birth_date)) FROM Client
WHERE Gender='F';
gives 2461.
52
Conditional aggregation
Split apply combine in SQLI the GROUP BY clause of the SELECT command provides
conditional analysisI it splits the relation into groups of tuples on which it applies
chosen aggregation functionsI groups can be further selected based on global properties with the
HAVING clause
General formSELECT aggregates FROM relation
[WHERE conditions]GROUP BY columns[HAVING group conditions]
53
Split apply combine
X Y4 C2 B2 C3 C4 B1 C4 C4 A3 B3 A1 A1 B1 A3 B2 C
Split
Y XA 4A 3A 1A 1
Y XB 2B 4B 3B 1B 3
Y XC 4C 2C 3C 1C 4C 2
Apply
Y SumA 9
Y SumB 13
Y SumC 16
CombineY SumA 9B 13C 16
54
Examples
Count clients per genderSELECT gender, COUNT(gender) AS number FROM Client
GROUP BY gender;
gender numberF 2645M 2724
Count clients per district id and genderSELECT district_id, gender, COUNT(gender) AS number
FROM Client GROUP BY district_id, gender;
district_id gender number1 F 3241 M 3392 F 262 M 203 F 27
55
Example
Count clients per district id and genderand keep only groups with more than 50 persons;SELECT district_id, gender, COUNT(gender) AS number
FROM Client GROUP BY district_id, genderHAVING COUNT(*)>=50;
district_id gender number1 F 3241 M 339
54 F 7554 M 8064 F 5764 M 5270 F 8870 M 8172 F 5072 M 5474 F 8474 M 96
56
Joins
Joining relationsI a SELECT command can operate on multiple relations at once (in
the FROM clause)I default behavior: cartesian product (a.k.a. a cross join)I key based joins are explicitly constructed via WHERE or ON
conditionsI dotted notation: relation.columnI alias for relation in the FROM clauseI support for complex conditions (not only key equality)I inner joins by default if implicit
I implicit joins: WHEREI explicit joins: JOIN ... ON
57
Example
Financial dataI goal: gender distribution per regionI sources:
I genders in the Client tableI regions in the District tableI district_id is a foreign key in the Client table which references to the
District tableI first step (join)
SELECT gender, A3 AS region From Client, DistrictWHERE Client.district_id = District.district_id;
I full query (with the group by)SELECT C.gender, D.A3 AS region, count(*) as number
FROM Client C, District DWHERE C.district_id = D.district_idGROUP BY region, C.gender;
58
Example
Financial dataI goal: gender distribution per regionI sources:
I genders in the Client tableI regions in the District tableI district_id is a foreign key in the Client table which references to the
District tableI first step (join)
SELECT gender, A3 AS region From ClientINNER JOIN DistrictON Client.district_id = District.district_id;
I full query (with the group by)SELECT C.gender, D.A3 AS region, count(*) as number
FROM Client C INNER JOIN District DWHERE C.district_id = D.district_idGROUP BY region, C.gender;
59
Example
District relationdistrict_id A2 A3
1.00 Hl.m. Praha Prague2.00 Benesov central Bohemia3.00 Beroun central Bohemia4.00 Kladno central Bohemia5.00 Kolin central Bohemia
Joined relationsgender regionF south BohemiaM PragueF PragueM central BohemiaF central Bohemia
Client relationclient_id gender district_id
1 F 182 M 13 F 14 M 55 F 5
Resultsregion gender Numbercentral Bohemia F 336central Bohemia M 328east Bohemia F 318east Bohemia M 342north Bohemia F 281north Bohemia M 280north Moravia F 437north Moravia M 483Prague F 324Prague M 339south Bohemia F 231south Bohemia M 218south Moravia F 474south Moravia M 463west Bohemia F 244west Bohemia M 271
60
Example
Financial dataI goal: persons with an account in a bank another district than their
home districtI sources:
I home district in the Client tableI bank district in the Account tableI links between client and account in the Disposition table
I querySELECT C.client_id, C.district_id AS home,
A.district_id AS bankFROM Client C INNER JOIN Disposition D
INNER JOIN Account AWHERE C.client_id = D.client_id
AND D.account_id=A.account_idAND C.district_id != A.district_id;
61
Example
Resultsclient_id home bank
12 40 7020 46 7042 68 6744 38 147 76 3648 16 6477 1 4898 33 2699 33 26
102 74 54103 74 54114 74 68115 74 68138 54 68151 51 5153 13 2162 65 75193 1 23195 74 60196 74 60
62
Outer joins
Similar approachI replace INNER JOIN by
I LEFT OUTER JOIN (or LEFT JOIN)I RIGHT OUTER JOIN (or RIGHT JOIN)I FULL OUTER JOIN (or FULL JOIN)
63
Other possibilities
Not described hereI ORDER BY: results orderingI DISTINCT: unique results onlyI nested queries: SELECT queries in other SELECT queriesI set operations such as UNION and INTERSECT
64
SQL Data Manipulation Language
INSERTI inserting a tuple into a relation:
INSERT INTO table VALUES (...);
I variants includeINSERT INTO table (columns...) VALUES (...); tospecify the column names (NULL is assigned to missing columns)
DELETEI deleting is done conditionally, using a WHERE clauseI general syntax
DELETE FROM table WHERE condition;
65
Updating
UPDATEI used to alter tuplesI general syntax
UPDATE tableSET column = value [,column = value...][WHERE condition];
66
Changelog
I December 2019: added a short introduction to SQLI November 2019: initial version
67
Licence
This work is licensed under a Creative CommonsAttribution-ShareAlike 4.0 International License.
http://creativecommons.org/licenses/by-sa/4.0/
68
Version
Last git commit: 2019-12-11By: Fabrice Rossi ([email protected])Git hash: af2cee4da140c15fb47c0ff45a9ff8b1028fcbd0
69