CS 440: Database Management Systems 1: Introduction.

46
CS 440: Database Management Systems 1: Introduction

Transcript of CS 440: Database Management Systems 1: Introduction.

Page 1: CS 440: Database Management Systems 1: Introduction.

CS 440: Database Management Systems

1: Introduction

Page 2: CS 440: Database Management Systems 1: Introduction.

Welcome to CS440!

• Arash Termehchy– Assistant professor in the school of EECS– Just moved here from Illinois – Usable data exploration systems.

• Your turn:– Name, field, DB background

Page 3: CS 440: Database Management Systems 1: Introduction.

Data management

• Modeling a large number of entities and relationships.– Called structured data– Formal (logical) model

• Maintaining them on computational devices– Servers in the cloud, sensor networks, …– Keep them organized according to model– Cope with failures–…

Page 4: CS 440: Database Management Systems 1: Introduction.

Data management

• Exploring entities and relationships efficiently, easily, and effectively–Where are the more affordable apartments in

Portland?–Who is the most similar person to Alan?– How a virus will likely to spread in a population?

• Make an informed and effective decision

Page 5: CS 440: Database Management Systems 1: Introduction.

Why study data management?

• Data is everywhere:– Business: financial analytics, …– Social: social network, data sharing, … – Personal: map apps, …– Science: spread of diseases, …

Page 6: CS 440: Database Management Systems 1: Introduction.

Data management is valuable

• According to McKinsey & Company’s:– $300 billion potential annual value to US health care – €250 billion potential annual value to Europe’s public

sector– 60% potential increase in retailer’s operating margins

• Data science is transforming the way we make decisions, make scientific discovery, …– Analyzing genetic data to find cures for diseases.

Page 7: CS 440: Database Management Systems 1: Introduction.

Data management is challenging

• According to McKinsey & Company’: – 30 billion data items shared on Facebook every

month– 235 TB collected by the Library of Congress– 40% growth in the global data each year

• 90% of world’s data was generated in the last two year!

• Big data: huge, heterogeneous, evolving

Page 8: CS 440: Database Management Systems 1: Introduction.

We study these challenges

• How to get what we like from the data easily, effectively, and efficiently?

Page 9: CS 440: Database Management Systems 1: Introduction.

Why should we learn these subjects?

• Isn’t sufficient to know SQL?– Let companies that make database management

systems to worry about these issues.

• No! You will end up with:– A query that takes hundreds of hours to finish!– A database that contains negative salaries!

Page 10: CS 440: Database Management Systems 1: Introduction.

Why should we learn these subjects?

• Managing conventional data requires more:– Tuning databases, developing efficient data

exploration programs, …

• You may face unconventional data management scenarios– The data may be a big graph that is constantly

evolving.

• You may use data management ideas in your own work.

Page 11: CS 440: Database Management Systems 1: Introduction.

Prerequisites

• Good programming skills• CS 261 and CS 275 or equivalent • Contact instructor if you are not sure.

Page 12: CS 440: Database Management Systems 1: Introduction.

Readings

• Required:– Database Systems: The Complete Book, Hector

Garcia Molina, Jeffry Ullman, and Jennifer Widom– Notes on the course website for subjects not

covered by the textbook.

Page 13: CS 440: Database Management Systems 1: Introduction.

Readings

• Recommended:– Database Management Systems, Ragu

Ramakrishnan and Johannes Gehrke– Foundations of Databases, Serge Abiteboul,

Richard Hull, and Victor Vianu

• Other useful readings on the course website.

Page 14: CS 440: Database Management Systems 1: Introduction.

Grading Scheme

• Assignments 40%• Project 60%

Page 15: CS 440: Database Management Systems 1: Introduction.

Assignments

• Written assignments– To understand the main concepts and methods.– Should be done individually.

• Start soon!

Page 16: CS 440: Database Management Systems 1: Introduction.

Project

• A database centric application– Data Engineering effort.

• Advanced feature– Different from CS 275– Easier search (keyword search)– Nice visualization– …

Page 17: CS 440: Database Management Systems 1: Introduction.

Project

• Group 2 – 4– Practice how to work in groups

• Project definition is due in the third week of the class!– The data, application, and scope– 5% of total grade

Page 18: CS 440: Database Management Systems 1: Introduction.

Basic Concepts

• Database management system (DBMS):– A piece of software that simplifies and facilitates

data management and exploration.

• Database content– Data– Schema: information about data, meaning of the

data

Salary:Schem

a 10Data

Age: 10

Page 19: CS 440: Database Management Systems 1: Introduction.

Physical Data Independence

• Independence from physical details– File system, operating system, hardware, ..

• Data models– The way that we see real-world data.– Relational data model: everything is a relation.

• Declarative query language: SQL– Say what, not how

Page 20: CS 440: Database Management Systems 1: Introduction.

20

Relational Database Management

Conceptual Design

Physical Storage

Schema

Entity Relationship(ER)

Model

Relational Model Files and Indexes

Page 21: CS 440: Database Management Systems 1: Introduction.

21

Topics

Conceptual Design

Physical Storage

Schema

Entity Relationship(ER)

Model

Relational Model Files and Indexes

Page 22: CS 440: Database Management Systems 1: Introduction.

22

Topics

Conceptual Design

Physical Storage

Schema

Entity Relationship(ER)

Model

Relational Model Files and Indexes

Page 23: CS 440: Database Management Systems 1: Introduction.

23

Topics

Conceptual Design

Physical Storage

Schema

Entity Relationship(ER)

Model

Relational Model Files and Indexes

Modeling data and asking questions:

Relational Model & Languages

Page 24: CS 440: Database Management Systems 1: Introduction.

24

Topics

Conceptual Design

Physical Storage

Schema

Entity Relationship(ER)

Model

Relational Model Files and Indexes

Organizing the data:Database design

Page 25: CS 440: Database Management Systems 1: Introduction.

25

Topics

Conceptual Design

Physical Storage

Schema

Entity Relationship(ER)

Model

Relational Model Files and Indexes

Keeping the data clean and meaningful:

Integrity constraints

Page 26: CS 440: Database Management Systems 1: Introduction.

26

Topics

Conceptual Design

Physical Storage

Schema

Entity Relationship(ER)

Model

Relational Model Files and Indexes

Doing more than asking queries:

Stored procedures, ORM

Page 27: CS 440: Database Management Systems 1: Introduction.

27

Topics

Conceptual Design

Physical Storage

Schema

Entity Relationship(ER)

Model

Relational Model Files and Indexes

Storing data in files:Storage Management

Page 28: CS 440: Database Management Systems 1: Introduction.

28

Topics

Conceptual Design

Physical Storage

Schema

Entity Relationship(ER)

Model

Relational Model Files and Indexes

Finding data in a big file really fast:

Data access methods

Page 29: CS 440: Database Management Systems 1: Introduction.

29

Topics

Conceptual Design

Physical Storage

Schema

Entity Relationship(ER)

Model

Relational Model Files and Indexes

Translating complex queries to read & write:

Query execution & optimization

Page 30: CS 440: Database Management Systems 1: Introduction.

30

Topics

Conceptual Design

Physical Storage

Schema

Entity Relationship(ER)

Model

Relational Model Files and Indexes

Coping with failure:Transaction Management

Page 31: CS 440: Database Management Systems 1: Introduction.

31

Topics

Conceptual Design

Physical Storage

Schema

Entity Relationship(ER)

Model

Relational Model Files and Indexes

Tuning

Page 32: CS 440: Database Management Systems 1: Introduction.

32

Relational Database Management

Conceptual Design

Physical Storage

Schema

Entity Relationship(ER)

Model

Relational Model Files and Indexes

Page 33: CS 440: Database Management Systems 1: Introduction.

33

Conceptual Design

• High level data model– Describe information in the database without

worrying about implementation issues

• ER model is the most popular tool for conceptual design– Invented by Peter Chen in 1976– Provides an easy-to-use language: pictures

• We review the basic stuff

Page 34: CS 440: Database Management Systems 1: Introduction.

34

ER Model/ Diagram

address name ssn

Person

buys

sells

employs

PublisherBook

title category

address

name

price

Page 35: CS 440: Database Management Systems 1: Introduction.

35

ER Model• Entity Set– An entity is distinctive real world object: cs540 textbook – An entity set is a collection of entities

• Attribute– Belongs to an entity

– Does not contain any other attribute: atomic– Atomic data types: string, integer, real, …

Book

title category

price

PublisherBook

Page 36: CS 440: Database Management Systems 1: Introduction.

36

Relationship• Describe relationships between entity sets• Do not exists without entities

• May have attributes

Person employs Publisher

Person employs Publisher

startdate

Page 37: CS 440: Database Management Systems 1: Introduction.

37

Relationship Multiplicity• One to one: – publisher - manager

• Many to one– book – publisher

• Many to many– publisher – person

Page 38: CS 440: Database Management Systems 1: Introduction.

38

Multi-way Relationships• Relationships between more than two entity sets• Each entity set has a different role in the

relationship

Purchase

Book

Person

Store

buyer

seller

Page 39: CS 440: Database Management Systems 1: Introduction.

39

ER Model: Keys• Attribute(s) that uniquely identify entities – No standard way to annotate: usually underlined.

• Each entity set must have a key– Why?

• Relationships may also have keys

address name ssn

Person

Page 40: CS 440: Database Management Systems 1: Introduction.

40

Topics

Conceptual Design

Physical Storage

Schema

Entity Relationship(ER)

Model

Relational Model Files and Indexes

Modeling data and asking questions:

Relational Model & Languages

Page 41: CS 440: Database Management Systems 1: Introduction.

41

Relational Model• Relational model defines data organization and

data retrieval/manipulation operations • It is easier to implement than ER model• It captures more details about the data

Page 42: CS 440: Database Management Systems 1: Introduction.

42

An Example

Title Price Category Year

MySQL $102.1 computer 2001

Cell biology $201.69 biology 1954

French cinema $53.99 art 2002

NBA History $63.65 sport 2010

tuples

Attribute namesRelation name

Book:

Page 43: CS 440: Database Management Systems 1: Introduction.

43

Relational Model• Attributes– Atomic values– atomic types: string, integer, real, date, …

• Each relation must have keys – Attributes without duplicate values– A relation does not contain duplicate tuples.

• Reordering tuples does not change the relation.• Reordering attributes does not change the

relation.

Page 44: CS 440: Database Management Systems 1: Introduction.

44

Database Schema vs. Database Instance

• Schema of a Relation– Names of the relation and their attributes.– E.g.: Person (Name, Address, SSN)– Types of the attributes– Constraints on the values of the attributes

• Schema of the database – Set of relation schemata – E.g.: Person (Name, Address, SSN)

Employment(Company, SSN)

Page 45: CS 440: Database Management Systems 1: Introduction.

45

Database Schema vs. Database Instance

• Schema: Book(Title, Price, Category, Year)• Instance:

Title Price Category Year

MySQL $102.1 computer 2001

Cell biology $201.69 biology 1954

French cinema $53.99 art 2002

NBA History $63.65 sport 2010

Page 46: CS 440: Database Management Systems 1: Introduction.

46

Example Schema

Beers(name, manf)

Bars(name, addr, license)

Drinkers( name, addr, phone)

Likes(drinker, beer)

Sells(bar, beer, price)

Frequents(drinker, bar)