L1: Course Overview & Review

35
H.Lu/HKUST L1: Course Overview & Review

description

L1: Course Overview & Review. The Teaching Staff. Instructor: Lu Hongjun Office: 3543 (Lift 25-26), HKUST E-Mail: [email protected] URL: http://www.cs.ust.hk/~luhj Research Interests: Data/Knowledge base management with emphasis on query processing and optimization - PowerPoint PPT Presentation

Transcript of L1: Course Overview & Review

Page 1: L1: Course Overview & Review

H.Lu/HKUST

L1: Course Overview & Review

Page 2: L1: Course Overview & Review

H.Lu/HKUST L01: RDBMS REVIEW -- 2

The Teaching Staff Instructor: Lu Hongjun

Office: 3543 (Lift 25-26), HKUST E-Mail: [email protected] URL: http://www.cs.ust.hk/~luhj Research Interests:

• Data/Knowledge base management with emphasis on query processing and optimization

• Data warehousing and data mining• Applied performance evaluation• Database application development • Parallel and distributed database systems

TA: Name Jiang Haifeng Liu Guimei Office: 4212 (DB Lab) HKUST E-Mail: [email protected] [email protected] URL: http://ihome.ust.hk/~jianghf http://ihome.ust.hk/~cslgm

Page 3: L1: Course Overview & Review

H.Lu/HKUST L01: RDBMS REVIEW -- 3

References

R.Ramakrishnan & J. Gehrke. Database Management Systems, 3rd Ed. McGraw Hill, 2000

D. Shasha & P. Bonnet. Database Tuning: Principles, Experiments, and Troubleshooting Techniques, Revised edition , Morgan Kaufmann, 2002

Related papers

Page 4: L1: Course Overview & Review

H.Lu/HKUST L01: RDBMS REVIEW -- 4

Course Contents

Part I: Issues in database administration Database design Principles of database performance tuning Database security

Part II: Emerging DB-related technology OLAP and data warehouse XML data management Data stream processing

Course Web Page: http://course.cs.ust.hk/comp334/

Page 5: L1: Course Overview & Review

H.Lu/HKUST L01: RDBMS REVIEW -- 5

Grading

Written assignment (20%) Exams (25%) Course project (50 %) Class participation (5%)

Page 6: L1: Course Overview & Review

H.Lu/HKUST L01: RDBMS REVIEW -- 6

Course Project Requirements Carried in teams of two or four Database related projects

You propose your own project, and get approve from the instructor

Topic: database related The amount of work : it accounts for 50% of your final

grade Required documents (double-spaced)

Project proposal (1-2 pages)• due date: 23-24/02

Status report (4-6 pages)• due date: 28-29/03

Final report (8-10 pages)• due date: 10-11/05

Page 7: L1: Course Overview & Review

H.Lu/HKUST L01: RDBMS REVIEW -- 7

Summary

It is a graduate level course Not a DBA course Not an introductory database course Not a programming course, but you need to know

how to write programs Hopefully, you will leave with

A good grade A good understanding of studied topics

Page 8: L1: Course Overview & Review

H.Lu/HKUST L01: RDBMS REVIEW -- 8

Review -- RDBMS

Relational database systems The basic concepts in database systems Relational data model Relational languages

Database design Previous course: conceptual and logic design This course: physical database design

Database management systems The basic components of DBMS Storage management Transaction management Query processing & optimization

Page 9: L1: Course Overview & Review

H.Lu/HKUST L01: RDBMS REVIEW -- 9

What Is Database & DBMS?

Database: a very large, integrated, persistent collection of data. Models real-world enterprise.

• Entities (e.g., students, courses)

• Relationships (e.g., James is taking CSIT530)

A Database Management System (DBMS) is a software package designed to store and manage databases.

Page 10: L1: Course Overview & Review

H.Lu/HKUST L01: RDBMS REVIEW -- 10

Data Models

A data model is a collection of concepts for describing data and related operations, semantics of data, relationship among data, and constraints on data

Two types of data models Conceptual models: emphasize semantics of data

• Entity-Relationship model, Object-Oriented model

Logical models: ways how the data is organized in the logical level

• Hierarchical model, Network model, Relational model

Page 11: L1: Course Overview & Review

H.Lu/HKUST L01: RDBMS REVIEW -- 11

Instances and Schemas

A schema is a description of a particular collection of data, using a given data model - the logical structure of the database (e.g., set of customers and accounts and the relationship between them)

Schema Instance - the actual content of the database at a particular point in time

Similar to types and variables in programming languages

Page 12: L1: Course Overview & Review

H.Lu/HKUST L01: RDBMS REVIEW -- 12

Levels of Abstraction

ANSI-SPARC three-level architecture

Many views, single conceptual (logical) schema and physical schema. Views describe how users

see the data.

Conceptual schema defines logical structure

Physical schema describes the files and indexes used.

View

Conceptual Schema

Physical Schema

ViewView

Page 13: L1: Course Overview & Review

H.Lu/HKUST L01: RDBMS REVIEW -- 13

Data Independence

Applications insulated from how data is structured and stored. Ability to modify a schema definition in one level without

affecting a schema definition in the next higher level. The interfaces between the various levels and components

should be well defined so that changes in some parts do not seriously influence others.

Logical data independence: Protection from changes in logical structure of data.

Physical data independence: Protection from changes in physical structure of data.

Page 14: L1: Course Overview & Review

H.Lu/HKUST L01: RDBMS REVIEW -- 14

Database Environment

ProceduresAnd standards

Data

DBMS

Hardware

ApplicationPrograms

Systemadministrator

DatabaseAdministrator

Analysts &Programmers

DatabaseDesigner

designs

manages

designs

writeuse

Specifies & enforces

End Users

Page 15: L1: Course Overview & Review

H.Lu/HKUST L01: RDBMS REVIEW -- 15

DBMS Related Languages

Data Definition Language (DDL) Specification notation for defining the database schema Data storage and definition language - special type of

DDL in which the storage structure and access methods used by the database system are specified

Data Manipulation Language (DML) Language for accessing and manipulation the data

organized by the appropriate data model Two classes of languages

• Procedural - user specifies what data is required and how to get those data.

• Nonprocedural - user specifies what data is required without specifying how to get those data

Page 16: L1: Course Overview & Review

H.Lu/HKUST L01: RDBMS REVIEW -- 16

DBMS Related Languages

Programming

Language for

DBMS Applications

Host Language

Data Sublanguage

DDL

DML

Query Language

Procedural

Non-Procedural

Page 17: L1: Course Overview & Review

H.Lu/HKUST L01: RDBMS REVIEW -- 17

Evolution of Database Technology

1960s: Hierarchical (IMS) & network (CODASYL) DBMS.

1970s: Relational data model, relational DBMS implementation.

1980: RDBMS rules the earth 1985-: Advanced data models (extended-relational,

OO, deductive, etc.)

Application-oriented DBMS (spatial, scientific, engineering, etc.).

1990s: ORDB, OLAP, Data mining, data warehousing, multimedia databases, and network databases.

Page 18: L1: Course Overview & Review

H.Lu/HKUST L01: RDBMS REVIEW -- 18

What is an RDBMS

A piece of software that manages data based on the relational model Relational data, SQL queries

Commercial products Oracle, IBM DB2, IBM Informix, Sybase,

Microsoft SQL Server Each has ~10 million lines of C/C++ code

Smaller packages – MySQL, PostgresSQL

Page 19: L1: Course Overview & Review

H.Lu/HKUST L01: RDBMS REVIEW -- 19

Relational Data Model

Main concept: relation A table with rows and columns

Every relation has a schema Description of the columns, or fields

Relational data – rows in a table No order among the rows in a table

The most widely used data model!

Page 20: L1: Course Overview & Review

H.Lu/HKUST L01: RDBMS REVIEW -- 20

University Database

Conceptual schema: Students (sid: string, name: string, login: string, age:

integer, gpa:real)

Cardinality = 3, degree = 5 , all rows distinct Courses (cid: string, cname:string, credits:integer) Enrolled (sid:string, cid:string, grade:string)

sid name login age gpa

53666 Jones jones@cs 18 3.4 53688 Smith smith@eecs 18 3.2 53650 Smith smith@math 19 3.8

Page 21: L1: Course Overview & Review

H.Lu/HKUST L01: RDBMS REVIEW -- 21

Relational Languages

Formal languages Relational algebra Relational calculus

Commercial language: SQL DDL (Data Definition Language)

• Create Table, Create Index, Create View … DML (Data Manipulation Language)

• Queries– Select

• Updates– Insert, Delete, Update

Page 22: L1: Course Overview & Review

H.Lu/HKUST L01: RDBMS REVIEW -- 22

Creating Tables

CREATE TABLE Students

(sid: CHAR(20), name: CHAR(20), login: CHAR(10), age: INTEGER, gpa: REAL)

CREATE TABLE Enrolled

(sid: CHAR(20), cid: CHAR(20), grade: CHAR(2))

Page 23: L1: Course Overview & Review

H.Lu/HKUST L01: RDBMS REVIEW -- 23

Primary Key Constraints

A set of fields is a key for a relation if : 1. Any two distinct tuples differ in some fields of

the set, and 2. This is not true for any subset of the set.

A superkey: Condition 1 true and 2 false. E.g., sid is a key for Students. {sid, gpa} is a

superkey. One primary key can be set per relation.

Page 24: L1: Course Overview & Review

H.Lu/HKUST L01: RDBMS REVIEW -- 24

Primary and Candidate Keys

CREATE TABLE Students

(sid: CHAR(20), name: CHAR(20), login: CHAR(10), age: INTEGER, gpa: REAL,PRIMARY KEY (sid),UNIQUE (login))

CREATE TABLE Enrolled (sid CHAR(20) cid CHAR(20), grade CHAR(2), PRIMARY KEY (sid,cid))

Page 25: L1: Course Overview & Review

H.Lu/HKUST L01: RDBMS REVIEW -- 25

Foreign Key Constraints

Foreign key : a set of fields in a relation Refers to the primary key of another relation

Referential integrity No dangling references

CREATE TABLE Enrolled (sid CHAR(20), cid CHAR(20), grade CHAR(2), PRIMARY KEY (sid,cid), FOREIGN KEY (sid) REFERENCES Students )

sid name login age gpa

53666 Jones jones@cs 18 3.453688 Smith smith@eecs 18 3.253650 Smith smith@math 19 3.8

sid cid grade53666 Carnatic101 C53666 Reggae203 B53650 Topology112 A53666 History105 B

EnrolledStudents

Page 26: L1: Course Overview & Review

H.Lu/HKUST L01: RDBMS REVIEW -- 26

Integrity Constraints (ICs)

IC: condition that must be true for any db instance Domain constraints Primary constraints Foreign key constraints

ICs are specified when a schema is defined. ICs are checked when relations are modified. A legal instance of a relation

Satisfies all specified ICs

Page 27: L1: Course Overview & Review

H.Lu/HKUST L01: RDBMS REVIEW -- 27

Adding and Deleting Tuples

INSERT INTO Students (sid, name, login, age, gpa)VALUES (53688, ‘Smith’, ‘smith@ee’, 18, 3.2)

DELETE FROM Students SWHERE S.name = ‘Smith’

Page 28: L1: Course Overview & Review

H.Lu/HKUST L01: RDBMS REVIEW -- 28

Queries

SELECT *FROM Students SWHERE S.sid = 53688

sid name login age gpa

53666 Jones jones@cs 18 3.453688 Smith smith@eecs 18 3.253650 Smith smith@math 19 3.8

Page 29: L1: Course Overview & Review

H.Lu/HKUST L01: RDBMS REVIEW -- 29

Querying Multiple Tables

SELECT S.name, E.cidFROM Students S, Enrolled EWHERE S.sid=E.sid AND E.grade=“A”

S.name E.cid

Smith Topology112

sid name login age gpa

53666 Jones jones@cs 18 3.453688 Smith smith@eecs 18 3.253650 Smith smith@math 19 3.8

sid cid grade53666 Carnatic101 C53666 Reggae203 B53650 Topology112 A53666 History105 B

EnrolledStudents

Page 30: L1: Course Overview & Review

H.Lu/HKUST L01: RDBMS REVIEW -- 30

Functional Components of DBMS

Statistics

Metadata

Indexes

User data

Storage Manager

Buffer Management

Index/file/record

Management

Execution Engine

Query Processing & Optimization

Buffer

DDL Compiler

Transaction Management

Recovery

Log

Concurrency Control

Lock Table

Query Plan

DDL Command

User/Application Database Administrator

Security Control

Storage Management

DML Stmt.

Query Processing

Transaction Manager

Page 31: L1: Course Overview & Review

H.Lu/HKUST L01: RDBMS REVIEW -- 31

Query Optimization

A major strength of RDBMS SQL queries are declarative Optimizer figures out how to answer them

Re-order operations Pick among alternatives of one operation Ensure that the answer is correct!

Page 32: L1: Course Overview & Review

H.Lu/HKUST L01: RDBMS REVIEW -- 32

Transaction

A key concept in databases An atomic sequence of actions (read/write) Brings DB from a consistent state to another ACID

Atomicity Consistency Isolation Durability

Page 33: L1: Course Overview & Review

H.Lu/HKUST L01: RDBMS REVIEW -- 33

Concurrency Control & Recovery

Concurrency Control Essential for good DBMS performance Run several user programs concurrently Interleave actions of different users Ensure the correctness

• Users may think it is a single-user system.

Recovery Essential for durability of transactions

Page 34: L1: Course Overview & Review

H.Lu/HKUST L01: RDBMS REVIEW -- 34

RDBMS Features

Effective and efficient access Easier application development Data independence Data integrity and security Concurrent access Recovery from crashes Uniform data administration

Page 35: L1: Course Overview & Review

H.Lu/HKUST L01: RDBMS REVIEW -- 35

Summary

DBMS used to maintain, query large datasets. Benefits include recovery from system crashes,

concurrent access, quick application development, data integrity and security.

Levels of abstraction give data independence. A DBMS typically has a layered architecture. DBAs hold responsible jobs and are well-paid! DBMS R&D is one of the broadest, most exciting areas in CS.