CS 440: Database Management Systems 1: Introduction.
-
Upload
rickey-lapsley -
Category
Documents
-
view
230 -
download
5
Transcript of CS 440: Database Management Systems 1: Introduction.
CS 440: Database Management Systems
1: Introduction
Welcome to CS440!
• Arash Termehchy– Assistant professor in the school of EECS– Just moved here from Illinois – Usable data exploration systems.
• Your turn:– Name, field, DB background
Data management
• Modeling a large number of entities and relationships.– Called structured data– Formal (logical) model
• Maintaining them on computational devices– Servers in the cloud, sensor networks, …– Keep them organized according to model– Cope with failures–…
Data management
• Exploring entities and relationships efficiently, easily, and effectively–Where are the more affordable apartments in
Portland?–Who is the most similar person to Alan?– How a virus will likely to spread in a population?
• Make an informed and effective decision
Why study data management?
• Data is everywhere:– Business: financial analytics, …– Social: social network, data sharing, … – Personal: map apps, …– Science: spread of diseases, …
Data management is valuable
• According to McKinsey & Company’s:– $300 billion potential annual value to US health care – €250 billion potential annual value to Europe’s public
sector– 60% potential increase in retailer’s operating margins
• Data science is transforming the way we make decisions, make scientific discovery, …– Analyzing genetic data to find cures for diseases.
Data management is challenging
• According to McKinsey & Company’: – 30 billion data items shared on Facebook every
month– 235 TB collected by the Library of Congress– 40% growth in the global data each year
• 90% of world’s data was generated in the last two year!
• Big data: huge, heterogeneous, evolving
We study these challenges
• How to get what we like from the data easily, effectively, and efficiently?
Why should we learn these subjects?
• Isn’t sufficient to know SQL?– Let companies that make database management
systems to worry about these issues.
• No! You will end up with:– A query that takes hundreds of hours to finish!– A database that contains negative salaries!
Why should we learn these subjects?
• Managing conventional data requires more:– Tuning databases, developing efficient data
exploration programs, …
• You may face unconventional data management scenarios– The data may be a big graph that is constantly
evolving.
• You may use data management ideas in your own work.
Prerequisites
• Good programming skills• CS 261 and CS 275 or equivalent • Contact instructor if you are not sure.
Readings
• Required:– Database Systems: The Complete Book, Hector
Garcia Molina, Jeffry Ullman, and Jennifer Widom– Notes on the course website for subjects not
covered by the textbook.
Readings
• Recommended:– Database Management Systems, Ragu
Ramakrishnan and Johannes Gehrke– Foundations of Databases, Serge Abiteboul,
Richard Hull, and Victor Vianu
• Other useful readings on the course website.
Grading Scheme
• Assignments 40%• Project 60%
Assignments
• Written assignments– To understand the main concepts and methods.– Should be done individually.
• Start soon!
Project
• A database centric application– Data Engineering effort.
• Advanced feature– Different from CS 275– Easier search (keyword search)– Nice visualization– …
Project
• Group 2 – 4– Practice how to work in groups
• Project definition is due in the third week of the class!– The data, application, and scope– 5% of total grade
Basic Concepts
• Database management system (DBMS):– A piece of software that simplifies and facilitates
data management and exploration.
• Database content– Data– Schema: information about data, meaning of the
data
Salary:Schem
a 10Data
Age: 10
Physical Data Independence
• Independence from physical details– File system, operating system, hardware, ..
• Data models– The way that we see real-world data.– Relational data model: everything is a relation.
• Declarative query language: SQL– Say what, not how
20
Relational Database Management
Conceptual Design
Physical Storage
Schema
Entity Relationship(ER)
Model
Relational Model Files and Indexes
21
Topics
Conceptual Design
Physical Storage
Schema
Entity Relationship(ER)
Model
Relational Model Files and Indexes
22
Topics
Conceptual Design
Physical Storage
Schema
Entity Relationship(ER)
Model
Relational Model Files and Indexes
23
Topics
Conceptual Design
Physical Storage
Schema
Entity Relationship(ER)
Model
Relational Model Files and Indexes
Modeling data and asking questions:
Relational Model & Languages
24
Topics
Conceptual Design
Physical Storage
Schema
Entity Relationship(ER)
Model
Relational Model Files and Indexes
Organizing the data:Database design
25
Topics
Conceptual Design
Physical Storage
Schema
Entity Relationship(ER)
Model
Relational Model Files and Indexes
Keeping the data clean and meaningful:
Integrity constraints
26
Topics
Conceptual Design
Physical Storage
Schema
Entity Relationship(ER)
Model
Relational Model Files and Indexes
Doing more than asking queries:
Stored procedures, ORM
27
Topics
Conceptual Design
Physical Storage
Schema
Entity Relationship(ER)
Model
Relational Model Files and Indexes
Storing data in files:Storage Management
28
Topics
Conceptual Design
Physical Storage
Schema
Entity Relationship(ER)
Model
Relational Model Files and Indexes
Finding data in a big file really fast:
Data access methods
29
Topics
Conceptual Design
Physical Storage
Schema
Entity Relationship(ER)
Model
Relational Model Files and Indexes
Translating complex queries to read & write:
Query execution & optimization
30
Topics
Conceptual Design
Physical Storage
Schema
Entity Relationship(ER)
Model
Relational Model Files and Indexes
Coping with failure:Transaction Management
31
Topics
Conceptual Design
Physical Storage
Schema
Entity Relationship(ER)
Model
Relational Model Files and Indexes
Tuning
32
Relational Database Management
Conceptual Design
Physical Storage
Schema
Entity Relationship(ER)
Model
Relational Model Files and Indexes
33
Conceptual Design
• High level data model– Describe information in the database without
worrying about implementation issues
• ER model is the most popular tool for conceptual design– Invented by Peter Chen in 1976– Provides an easy-to-use language: pictures
• We review the basic stuff
34
ER Model/ Diagram
address name ssn
Person
buys
sells
employs
PublisherBook
title category
address
name
price
35
ER Model• Entity Set– An entity is distinctive real world object: cs540 textbook – An entity set is a collection of entities
• Attribute– Belongs to an entity
– Does not contain any other attribute: atomic– Atomic data types: string, integer, real, …
Book
title category
price
PublisherBook
36
Relationship• Describe relationships between entity sets• Do not exists without entities
• May have attributes
Person employs Publisher
Person employs Publisher
startdate
37
Relationship Multiplicity• One to one: – publisher - manager
• Many to one– book – publisher
• Many to many– publisher – person
38
Multi-way Relationships• Relationships between more than two entity sets• Each entity set has a different role in the
relationship
Purchase
Book
Person
Store
buyer
seller
39
ER Model: Keys• Attribute(s) that uniquely identify entities – No standard way to annotate: usually underlined.
• Each entity set must have a key– Why?
• Relationships may also have keys
address name ssn
Person
40
Topics
Conceptual Design
Physical Storage
Schema
Entity Relationship(ER)
Model
Relational Model Files and Indexes
Modeling data and asking questions:
Relational Model & Languages
41
Relational Model• Relational model defines data organization and
data retrieval/manipulation operations • It is easier to implement than ER model• It captures more details about the data
42
An Example
Title Price Category Year
MySQL $102.1 computer 2001
Cell biology $201.69 biology 1954
French cinema $53.99 art 2002
NBA History $63.65 sport 2010
tuples
Attribute namesRelation name
Book:
43
Relational Model• Attributes– Atomic values– atomic types: string, integer, real, date, …
• Each relation must have keys – Attributes without duplicate values– A relation does not contain duplicate tuples.
• Reordering tuples does not change the relation.• Reordering attributes does not change the
relation.
44
Database Schema vs. Database Instance
• Schema of a Relation– Names of the relation and their attributes.– E.g.: Person (Name, Address, SSN)– Types of the attributes– Constraints on the values of the attributes
• Schema of the database – Set of relation schemata – E.g.: Person (Name, Address, SSN)
Employment(Company, SSN)
45
Database Schema vs. Database Instance
• Schema: Book(Title, Price, Category, Year)• Instance:
Title Price Category Year
MySQL $102.1 computer 2001
Cell biology $201.69 biology 1954
French cinema $53.99 art 2002
NBA History $63.65 sport 2010
46
Example Schema
Beers(name, manf)
Bars(name, addr, license)
Drinkers( name, addr, phone)
Likes(drinker, beer)
Sells(bar, beer, price)
Frequents(drinker, bar)