NEU 221: Neuroinformatics Seminar Introduction to Databases Bertram Ludäscher [email protected] San...

21
NEU 221: Neuroinformatics Seminar Introduction to Databases Bertram Ludäscher [email protected] San Diego Supercomputer Center U.C. San Diego

Transcript of NEU 221: Neuroinformatics Seminar Introduction to Databases Bertram Ludäscher [email protected] San...

Page 1: NEU 221: Neuroinformatics Seminar Introduction to Databases Bertram Ludäscher ludaesch@sdsc.edu San Diego Supercomputer Center U.C. San Diego.

NEU 221: Neuroinformatics Seminar Introduction to Databases

Bertram Ludä[email protected]

San Diego Supercomputer CenterU.C. San Diego

Page 2: NEU 221: Neuroinformatics Seminar Introduction to Databases Bertram Ludäscher ludaesch@sdsc.edu San Diego Supercomputer Center U.C. San Diego.

2 B. Ludäscher: Introduction to Databases

Outline

• What is a DB and why should I care? • DB Basics & Architecture• Relational Model (SQL, ER)• Extended/Other Models

– Deductive Databases

– Object-Oriented Databases

– Semistructured/Graph-Databases

Page 3: NEU 221: Neuroinformatics Seminar Introduction to Databases Bertram Ludäscher ludaesch@sdsc.edu San Diego Supercomputer Center U.C. San Diego.

3 B. Ludäscher: Introduction to Databases

What is a Database?

• The term database can stand for ...– a concrete collection of data (books@amazon, CCDB@NCMIR) – a system (software & hardware) for storing and managing databases

(=> Database Management System: DBMS + DB)

• Underlying data model => Type of DBMS (short: DB) – relational model: based on relations (“tables”) and entities– object-oriented model: complex objects, classes – object-relational model: relations + objects– XML: “semistructured” model, trees

• Specialized/extended models– deductive DBs– multimedia DBs– GIS (Geographic Information Systems)

Page 4: NEU 221: Neuroinformatics Seminar Introduction to Databases Bertram Ludäscher ludaesch@sdsc.edu San Diego Supercomputer Center U.C. San Diego.

4 B. Ludäscher: Introduction to Databases

Functions of a DBMS (aka what does it buy me?)

• Persistent Data Storage – but don’t forget to backup!

• Efficient & High-Level Querying of Very Large Datasets– file systems + your homegrown “scans” won’t do for VLDBs!!

• Same for Updates: insert, delete, and modify• Data Integrity, Security

– Checking/enforcement of integrity constraints

– Access control

• Concurrent (multi-user) Access, Transactions, Recovery• Robust, Scalable Data Management Solutions

Page 5: NEU 221: Neuroinformatics Seminar Introduction to Databases Bertram Ludäscher ludaesch@sdsc.edu San Diego Supercomputer Center U.C. San Diego.

5 B. Ludäscher: Introduction to Databases

3-Level ANSI/SPARC Database Architecture

• external (user) level • conceptual (logical) level • internal (physical) level

=> Data independence– logical data independence

– physical data independence

View -1 View -2 View -n

physical schema

conceptual/logical schema

Page 6: NEU 221: Neuroinformatics Seminar Introduction to Databases Bertram Ludäscher ludaesch@sdsc.edu San Diego Supercomputer Center U.C. San Diego.

6 B. Ludäscher: Introduction to Databases

Concurrency Control

• Concurrent execution of simultaneous requests – long before web servers where around...

– transaction management guarantees consistency despite concurrent/interleaved execution

• Transactions= Sequence of DB operations (read/write)

– Atomicity: a transaction is executed completely or not at all

– Consistency: a transaction creates a new consistent DB state, i.e., in which all integrity constraints are maintained

– Isolation: to the user, a transaction seems to run in isolation

– Durability: the effect of a successful (“committed”) transaction remains even after system failure

Page 7: NEU 221: Neuroinformatics Seminar Introduction to Databases Bertram Ludäscher ludaesch@sdsc.edu San Diego Supercomputer Center U.C. San Diego.

7 B. Ludäscher: Introduction to Databases

The Relational Model

• Relation/Table Name:– employee, dept

• Attributes = Column Names:– Emp, Salary, Deptno, Name,

Mgr

• Relational Schema:– employee(Emp:string,

Salary:float, DeptNo:integer), ...

• Tuple = Row of the table:– (“anne”, “62000”, “2”)

• Relation = Set of tuples:– {(...), (...), ...}

Emp Salary Deptnojohn 60k 1anne 62k 2bob 57k 1jane 45k 3

employee

DeptNo NameMgr 1 Toys anne2 Sales anne3 Shoes tim

dept

Page 8: NEU 221: Neuroinformatics Seminar Introduction to Databases Bertram Ludäscher ludaesch@sdsc.edu San Diego Supercomputer Center U.C. San Diego.

8 B. Ludäscher: Introduction to Databases

Database Design: Entity-Relationship (ER) Model

• Entities:• Relationships:• Attributes:• ER Model:

– initial, high-level DB design (conceptual model)– easy to map to a relational schema (database tables)– comes with more constraints (cardinalities, aggregation) and extensions: EER

(is-a => class hierarchies)– related: UML (Unified Modeling Language) class diagrams

Employee Departmentworks-for

Name Salary ManagerName

since

Page 9: NEU 221: Neuroinformatics Seminar Introduction to Databases Bertram Ludäscher ludaesch@sdsc.edu San Diego Supercomputer Center U.C. San Diego.

9 B. Ludäscher: Introduction to Databases

Example: Creating a Relational Database in SQL

CREATE TABLE employee (

ssn CHAR(11),

name VARCHAR(30),

deptNo INTEGER,

PRIMARY KEY (ssn),

FOREIGN KEY (deptNo) REFERENCES department )

CREATE TABLE department (

deptNo INTEGER,

name VARCHAR(20),

manager CHAR(11),

PRIMARY KEY (deptNo),

FOREIGN KEY (manager) REFERENCES employee(ssn) )

Page 10: NEU 221: Neuroinformatics Seminar Introduction to Databases Bertram Ludäscher ludaesch@sdsc.edu San Diego Supercomputer Center U.C. San Diego.

10 B. Ludäscher: Introduction to Databases

Important Relational Operations

• Select(Relation, Condition) – filter rows of a table wrt. a condition

• Project(Relation, Attributes) – keep the columns of interest

• Join(Rel1, Att1, Rel2, Att2, Condition) – find “matches” in a “related” table

– e.g. match Rel1.foreign key = Rel2.primary key

• Union (“OR”), Intersection (“AND”)• Set-Difference (“NOT IN”)

Page 11: NEU 221: Neuroinformatics Seminar Introduction to Databases Bertram Ludäscher ludaesch@sdsc.edu San Diego Supercomputer Center U.C. San Diego.

11 B. Ludäscher: Introduction to Databases

Why (Declarative) Query Languages?

• Things we talk and think about in PLs and QLs– Assembly languages: registers, memory locations, jumps, ...

– C: if-then-else, for, while, memory (de-)allocation, pointers, ...

– Object-oriented languages:• C++: C plus objects, methods, classes, ...

• Java: objects, methods, classes, references, ...

• Smalltalk: objects, objects, objects, ...

• OQL: object-query language

– Functional languages (Haskell, ML): • (higher-order) mappings, recursion/induction, patterns, ...

=> Relational languages (SQL, Prolog)• relations, relational operations: , , , , ..., ,,,,,..., , ,

=> Semistructured/XML (Tree) & Graph Query Languages

,,Die Grenzen meiner Sprache bedeuten die Grenzen meiner Welt.”“The limits of my language mean the limits of my world.”

Ludwig Wittgenstein, Tractatus Logico-Philosophicus

“If you have a hammer, everything looks like a nail.”

Page 12: NEU 221: Neuroinformatics Seminar Introduction to Databases Bertram Ludäscher ludaesch@sdsc.edu San Diego Supercomputer Center U.C. San Diego.

12 B. Ludäscher: Introduction to Databases

Example: Querying a Relational Database

Emp Salary Deptnoanne 62k 2john 60k 1

employee

DeptNoMgr

1 anne2 anne

dept

SELECT Emp, MgrFROM employee, deptWHERE employee.DeptNo = dept.DeptNo

Emp Mgrjohn anneanne anne

result

join

input tables

SQL query (or view def.)

answer (or view)

Page 13: NEU 221: Neuroinformatics Seminar Introduction to Databases Bertram Ludäscher ludaesch@sdsc.edu San Diego Supercomputer Center U.C. San Diego.

13 B. Ludäscher: Introduction to Databases

Query Languages for Relational Databases

Page 14: NEU 221: Neuroinformatics Seminar Introduction to Databases Bertram Ludäscher ludaesch@sdsc.edu San Diego Supercomputer Center U.C. San Diego.

14 B. Ludäscher: Introduction to Databases

Deductive Databases (DATALOG) Syntax

Page 15: NEU 221: Neuroinformatics Seminar Introduction to Databases Bertram Ludäscher ludaesch@sdsc.edu San Diego Supercomputer Center U.C. San Diego.

15 B. Ludäscher: Introduction to Databases

DATALOG: Examples for Relational Operations

Page 16: NEU 221: Neuroinformatics Seminar Introduction to Databases Bertram Ludäscher ludaesch@sdsc.edu San Diego Supercomputer Center U.C. San Diego.

16 B. Ludäscher: Introduction to Databases

Recursive DATALOG Example: Transitive Closure

Page 17: NEU 221: Neuroinformatics Seminar Introduction to Databases Bertram Ludäscher ludaesch@sdsc.edu San Diego Supercomputer Center U.C. San Diego.

17 B. Ludäscher: Introduction to Databases

Non-Relational Datamodels• Relational model is “flat”: atomic data values

– extension: nested relational model (“tables within tables”, cf. nested HTML tables)

– values can be nested lists {...}, tuples (...), sets [...]– ISO standard(s): SQL– identity is value based

• Object-oriented data model:– complex (structured) objects with object-identity (oid)– class and type hierarchies (sub-/superclass, sub-/supertype)– OODB schema may be very close to “world model” (no translation

into tables)(+) queries fit your OO schema

(-) (new) queries that don’t fit nicely

– ODMG standard, OQL (Object Query Language)

Page 18: NEU 221: Neuroinformatics Seminar Introduction to Databases Bertram Ludäscher ludaesch@sdsc.edu San Diego Supercomputer Center U.C. San Diego.

18 B. Ludäscher: Introduction to Databases

Example: Object Query Language (OQL)

• Q: what does this OQL query compute?• Note the use of path expressions like e.manager.children

=> Semistructured/Graph Databases

SELECT DISTINCT STRUCT( E: e.name, C: e.manager.name, M: ( SELECT c.name FROM c IN e.children

WHERE FOR ALL d IN e.manager.children: c.age > d.age ) ) FROM e IN Employees;

Page 19: NEU 221: Neuroinformatics Seminar Introduction to Databases Bertram Ludäscher ludaesch@sdsc.edu San Diego Supercomputer Center U.C. San Diego.

19 B. Ludäscher: Introduction to Databases

A Semistructured (Graph) Database

Page 20: NEU 221: Neuroinformatics Seminar Introduction to Databases Bertram Ludäscher ludaesch@sdsc.edu San Diego Supercomputer Center U.C. San Diego.

20 B. Ludäscher: Introduction to Databases

Querying Graphs with OO-Path Expressions

?- dblp."Inf. Systems".L.P, substr("Volume",L), P : person.spouse[lives_in = P.lives_in].

?- dblp."Inf. Systems".L."Michael E. Senko".Answer:

L="Volume 1, 1975”;L="Volume 5, 1980".

Page 21: NEU 221: Neuroinformatics Seminar Introduction to Databases Bertram Ludäscher ludaesch@sdsc.edu San Diego Supercomputer Center U.C. San Diego.

21 B. Ludäscher: Introduction to Databases

Constructs for Querying Graphs

Example: ?- dblp . any* . (if(vldb)| if(sigmod))