O-O, What Are They Doing to Relational Databases? (The Evolution of DB2 Universal Database) Michael...

36
O-O, What Are They Doing to Relational Databases? (The Evolution of DB2 Universal Database) Michael J. Carey IBM Almaden January 1999

Transcript of O-O, What Are They Doing to Relational Databases? (The Evolution of DB2 Universal Database) Michael...

O-O, What Are They Doingto Relational Databases?

(The Evolution of DB2 Universal Database)

Michael J. Carey

IBM AlmadenJanuary 1999

Plan for Today's Presentation

The relational DBMS revolutionThe object-relational DBMS evolutionO-R features in DB2 Universal Database V5.2

Some O-R implementation tradeoffs (V5.2)What lies ahead for DB2 UDB & O-R databases?

Questions (and possibly answers)Please ask questions throughout...!

The Relational DBMS Revolution

The pre-relational era (1970's) Graph-based data models

Hierarchical (IMS), network (Codasyl) Low-level, navigational interfaces

Labor-intensive and error-proneThe relational era (1980's)

Simple, abstract data model Database = set of relations ("tables") 3 schema levels: views, base tables, physical schema Algebra of set-oriented operations

High-level, declarative interfaces SQL, QBE, et al Embedded languages, 4GLs

The Relational Model (in one slide)

Employees and departments

Department dno name 10 Toy 20 Shoe

Employee eno name salary dept 1 Lou 10000000 10 7 Laura 150000 20 22 Mike 80000 20

select E.name, E.salary, D.nofrom Employee E, Department Dwhere E.salary < 100000

and D.name = 'Shoe'and E.dept = D.dno;

?

Relational Databases: A Success Story

The relational model has been a big success

Simplicity has made research tractable Data independence yields productivity gains Both academia and industry have benefitted

Relational DBMS "goodies" include Efficient query optimization and execution Well-defined transaction semantics and support Excellent multi-user performance and robustness Views for data independence, authorization Constraints, triggers, and stored procedures for (shared) business rule capture/enforcement

"The" success story for parallel computing

We've Achieved Nirvana ... Right?

The world is becoming increasingly complex

New data types are appearing (e.g., multimedia) Real-world data doesn't fit neatly into tables

Entities and relationships (vs. tables) Variance among entities (vs. homogeneity) Set-valued attributes (vs. normalization)

Advanced applications bring complex data E.g., CAD/CAM data management, web data management, geographic information management, medical data management, (your favorite application goes here)

So maybe objects are the answer...? Yes, if we can keep all the relational "goodies"!

The Object-Relational DBMS EvolutionO-R extension #1: Abstract data types (ADTs)

New column types and functions/methods E.g., text, image, audio, video, time series, point, line, OLE...

For modeling new kinds of facts about enterprise entities Infrastructure for extenders/datablades/cartridges

O-R extension #2: Row types Types and functions/methods for rows of tables

Desirable features include references, inheritance, methods, late binding, and collection-valued attributes

For modeling enterprise entities with relationships & behavior

Infrastructure for DBMS-native object management

Recent SQL3 merger: Structured types Can use for types of columns and/or tables

"Not Your Father's Employee Type"

Beyond name, rank, and serial number New attribute types

Location (2-d point), job description (text), photo (image), ... Associated functions

Distance(point, point), contains(text, string), ...Beyond your basic employee record

Employees come in different flavors Emp, RSM, Programmer, Manager, Temp, ...

Employees have many known relationships Manager, department, projects, ...

Employees have behavior Age(Emp), qualified(Emp, Job), hire(Emp), ...

An Employee is a simple "business object"

OSF stands for "Object Strike Force" Semi-autonomous group "outside" UDB development

Focus: object-relational extensions for DB2 UDB Both near-term and longer-term interests Collaborate with our Toronto and Santa Teresa labs

Significant activities to date Prototyped "row type" support for DB2 UDB Delivered in DB2 UDB Version 5.2 (9/98) Significantly revised SQL3 draft standard Working on next step plus future technologies

The OSF Project at IBM Almaden

DB2 for Common Servers (Version 2) User-defined column types (UDTs/distinct types) User-defined functions (UDFs) Binary/character large objects (BLOBs/CLOBs)

Distinct types: new data types for columns Ex: create distinct type US_Dollar as Real with comparisons;

US_Dollar is an available UDT with functions =, <>, <, <=, >, >=, US_Dollar(Real), Real(US_Dollar)

User-defined functions: associated operations

create function CA_Tax (US_Dollar) returns US_Dollar external name 'money!US_Dollar' language C;

DB2 Universal Database, Version 5

Lots of other interesting features as well, e.g.:

Constraints and triggers Recursive queries OLAP support (cube and rollup) Extenders (based on UDTs/UDFs)

Wide range of hardware/software platforms PCs: Windows95, NT, OS/2, SCO Unix workstations: AIX, Solaris, HP/UX Parallel platforms: SMPs, MPPs (e.g., SP2)

Descended from Almaden's Starburst system

Extensible query compiler (with rule-based query rewrite and query optimizer components)

DB2 Universal Database, Version 5 (cont.)

Structured types and references Named types with attributes, O-O subtyping model Ref(T) for directly modelling relationships

Typed tables and table hierarchies Oid (user-provided) plus a column per attribute of T Subtables for querying and managing subtype instances

Query language extensions Substitutability for queries/updates (data independence ++)

Path expressions for querying relationships easily Functions/predicates for runtime type inquiries

Object views (via a novel approach) Virtual table hierarchies for flexible access control Also facilitates O-O views of legacy tables

New O-R Features in DB2 UDB V5.2

A Simple Example

Employee and department tables in the (late) 90's

emp

dept

student

person

exec

mgr

dept

Structured Types and References

Create structured types (and references)

create type Person_t as ( name Varchar(40), birthyear Integer);

create type Emp_t under Person_t as ( salary Integer, dept Ref(Dept_t));

create type Exec_t under Emp_t as ( bonus Integer);

create type Student_t under Person_t as ( major Varchar(20));

Structured Types and References (cont.)

Create structured types (cont).

create type Dept_t as (name Varchar(20),

budget Integer, headcount Integer, mgr Ref(Emp_t));

Okay, so I lied (a little) on the last slide...

alter type Emp_t add attribute dept Ref(Dept_t);

Typed Tables and Table Hierarchies

Now create typed tables (and subtables)

create table person of Person_t (ref is oid user generated);

create table emp of Emp_t under person (dept with options scope dept);

create table exec of Exec_t under emp;

create table student of Student_t under person;

create table dept of Dept_t (ref is oid user generated, mgr with options scope emp);

SQL Query Extensions (by example)

Substitutability

insert into empvalues (Emp_t('e100'), 'John Smith', 1968, 65000, (select oid from dept where name = 'Database'));

update person set birthyear = birthyear + 1 where name = 'John Smith';

Data modification (insert; update/delete)

select E.* from emp Ewhere E.birthyear > 1970 and E.salary > 50000;

Path expressionsselect E.name, E.dept->namefrom emp Ewhere E.dept->mgr->dept->mgr->name = 'Lou Gerstner';

Querying Table Hierarchies: An Example

oid name birthyear

P1 Harold 1970

P2 Carol 1958

oid name birthyear

dept

P3 Hamid 1956

Person

Emp

oid name .....

D1 Databases .....

Dept

select * from Person where name like 'H%'

select name, dept->name from Emp where birthyear < 1960

P4 Lou 19401940 _

SQL Query Extensions (cont.)

Support for type-dependent queries

select *from only (emp) Ewhere dept->budget > 10000000;

select namefrom person Pwhere deref(oid) is of type (only Emp_t, Student_t);

select type_name(deref(E.oid)), E.*from outer (emp) Ewhere e.oid = Emp_t('e13');

Other Data Definition Features

ref is for object id column Unique, user-generated (on insert)

scope clause for reference columns Provides critical information to the query optimizer

not null constraints Definable at any level of a table hierarchy Enforced for indicated table and its subtables

unique constraints Root level (and columns) only

create index for physical schema Unique or non-unique index on root table Non-unique index on subtable

Other Data Definition Features (cont.)

Authorization model for table hierarchies grant and revoke on table or subtables Substitutability: implicit subtable authorization on columns inherited from an authorized supertable

Ex #1: select privilege on person table Ex #2: update privilege on salary column of emp table

Some operations require authorization everywhere

deref function is of type predicate and type_xxx functions

SQL3 also supports granting of table/subtable privileges with hierarchy option

Object Views in DB2 UDB

Typed views and view hierarchies

vemp

vdept

vstudent

vperson

mgr

dept

Requirements: virtual table hierarchies Typed rows with (derived) object ids Views may be quite different from base data Support for interconnected "view schemas"

Types For Object Views

Create types for use in views

create type VPerson_t as ( name Varchar(40));

create type VEmp_t under VPerson_t as ( dept Ref(VDept_t));

create type VStudent_t under VPerson_t as ( kind Varchar(8));

create type VDept_t as (name Varchar(20), mgr Ref(VEmp_t)

);

Typed View Hierarchies

Now create typed views (and subviews)

create view vperson of VPerson_t (ref is oid user generated) as select VPerson_t(Varchar(oid)), name from only (person);

create view vemp of VEmp_t under vperson (dept with options scope vdept) as select VEmp_t(Varchar(oid)), name, VDept_t(Varchar(dept)) from emp where salary > 0;

create view vstudent of VStudent_t under vperson as select VStudent_t(Varchar(oid)), name, case when major like '%Engineer%' then 'Geek' else 'non-Geek' end from student;

create view vdept of VDept_t ...;

Some guiding principles for DB2 UDB V5.2 Performance must equal/exceed relational equivalents Design amenable to future plans w.r.t. type evolution Structured types must be supported in columns (someday)

Localize initial changes to query compiler where possible

Want "free" indexing, rewrites, optimization, parallelization, ...Influenced by discussions with a CAD/CAM vendor

Information on existing approach and installations Requirements for efficiency of new products

Let's look briefly at two areas Table hierarchy representation References and path query processing

O-R Implementation Issues/Tradeoffs

Implementation table approach One physical table per table hierarchy with:

Type tag column (to distinguish subtable rows) Object id column Columns for all columns of the root table and its subtables

Vertical partitioning approach One physical root table with:

Type tag column Object id column Columns for each root table column

N physical delta tables (one per subtable) with: Object id column Columns for each column introduced by this subtable

Implementing Table Hierarchies

Horizontal partitioning approach N separate physical tables with:

Object id column Columns for every subtable column (inherited or not)

So what did we do for UDB V5.2...? Vertical partitioning approach rejected quickly

Too many joins to materialize subtable rows Multi-column constraints and indices problematic

Horizontal partitioning approach rejected eventually Uniqueness issue for user-generated object ids Query complexity for multi-hierarchy join queries

Ex: select p.name, q.name from Person p, Person q where ... Implementation table approach taken for V5.2

Appeared to give us the most "free" functionality Adopted despite row size (null columns) downside

Implementing Table Hierarchies (cont.)

Reference values in tables should have a scope

"Other end" info for query rewrite and join optimization

Ditto for authorization checking (static vs. dynamic) Schema makes overly wide references unnecessary Uniqueness is hierarchy-relative, enforced with an index

V5.2 self-references (object ids) are user-generated

CAD/CAM vendor had "legacy references" in files Different users have different id generation schemes Loading cyclic data (e.g., emp<-> dept) is messy and slow

Ditto for creating objects from an object cache

References and Path Expressions

Path expressions are logically equivalent to subqueries

References and Path Expressions (cont.)

select E.name, E.dept->name, E.dept->mgr->namefrom emp Ewhere E.dept->headcount > 10;

Actual approach: shared subquery generation (QGM)

Compute common paths (prefixes) once to save work Not every SQL context accepts an actual subquery Also need to handle non-serializable locking levels

Efficiency obtained through query rewrite, e.g.: Subquery to outer-join transformation Outer-join to join transformation where possible

V5.2 of UDB contains new O-R features Structured types with inheritance Object tables and table hierarchies References and path expressions Object views and view hierarchies

Moreover, so does the SQL3 standard Includes object views and user-defined references IBM, Oracle, Informix heading in same general direction

Work continuing on O-R extensions Let's have a brief look...

Where We Are Today in UDB

Business rule mechanisms for typed tables Check constraints on tables/subtables (w/inheritance) Referential integrity constraints to and from tables/subtables

Triggers on tables/subtablesObject modeling and management extensions

User-defined reference types (ref using) More flexible object view definitions Type and instance (i.e., row) evolution

Structured types for attributes/columns Work in progress at IBM Santa Teresa Lab Functions/methods just around the corner as well

Additional Object Table Support

Efficient support for collection types Multivalued attributes (e.g., Project.team) Flavors: set, multiset, array, list, ... Need to integrate into SQL, support querying well Some experience from a first prototype

Other activities (and open problems) Java mappings & bindings for O-R data XML & data-centric web sites ("d-commerce") Business object servers (caching/consistency) Heterogeneous data & O-R database systems User-defined and/or external indexing Optimizer "hooks" for new data types Etc.!

Other Exploratory O-R Work

Almaden Research Center Mike Carey, Don Chamberlin, Srinivasa Narayanan, Bennet Vance; C.M. Park; Guido Moerkotte

Santa Teresa Lab Nelson Mattos Gene Fuh, Michelle Jou, Brian Tran

Toronto Lab Doug Doole, Serge Rielau, Rick Swagerman Leo Lao, Walid Rjaibi, Calisto Zuzarte Cheryl Greene, various other consultants/hecklers

And as for future versions of UDB Your name could appear here! (MS/PhD)

Partial List of UDB O-R Contributors

The End

What About Object-Oriented DBMSs?

OOPL + DBMS = OO-DBMS Commonly based on C++ or Smalltalk Persistence, collections, queries, versions, ...

Lots of interesting and useful research results O-O data models and query languages O-O query processing, system architecture, performance Various products (O2, Objectstore, Versant, Objectivity, ...)

No widespread commercial acceptance Many differences across systems (despite ODMG-93) Never really caught up to RDBMS techology

Schema compilation, evolution painful Missing many of the relational "goodies" Single-language focus, lack of (relational) tools

Stonebraker Fellow Criteria (found on web)

Industrial database researcherPhD from UC BerkeleyMust agree with the following motto:

Databases are the answer...! What was the question again...?

At least 6' tallHad a PhD thesis advisor with first name MikeProduced a PhD student with first name Mike