KR in Database Systems Implementationdavid/cs348/interpolation.pdf)the instance is a model of the...

68
KR in Database Systems Implementation David Toman Cheriton School of CS Joint work with Alexander Hudek and Grant Weddell David Toman KR in DBMS Implementation 1 / 40

Transcript of KR in Database Systems Implementationdavid/cs348/interpolation.pdf)the instance is a model of the...

Page 1: KR in Database Systems Implementationdavid/cs348/interpolation.pdf)the instance is a model of the constraints What about CREATE VIEW Statements? View declaration ˘a sentence 8x:V(x)

KR in Database Systems Implementation

David Toman

Cheriton School of CS

Joint work with Alexander Hudek and Grant Weddell

David Toman KR in DBMS Implementation 1 / 40

Page 2: KR in Database Systems Implementationdavid/cs348/interpolation.pdf)the instance is a model of the constraints What about CREATE VIEW Statements? View declaration ˘a sentence 8x:V(x)

Data and Constraints

The Textbook View of Relational Databases

Data represented as an instance of a relational structureQueries access to data via open formulæ (in an appropriate logic)Constraints data integrity enforced by sentences (in the same logic)

⇒ the instance is a model of the constraints

What about CREATE VIEW Statements?View declaration ∼ a sentence ∀x.V (x)↔ ϕ

where V is a (new) relational symbol and ϕ is a query (in our logic).⇒ now the instance is no longer a model (unless V is a materialized view)

What about CREATE INDEX Statements?Index declaration (on R for a key x) ∼ a sentence ∀r ,x.I(r ,x)↔ ∃y .R(r ,x,y)

. . . what is then the difference between the primary index on R and R itself?

David Toman (et al.) KR in DBMS Implementation Motivation 2 / 40

Page 3: KR in Database Systems Implementationdavid/cs348/interpolation.pdf)the instance is a model of the constraints What about CREATE VIEW Statements? View declaration ˘a sentence 8x:V(x)

Data and Constraints

The Textbook View of Relational Databases

Data represented as an instance of a relational structureQueries access to data via open formulæ (in an appropriate logic)Constraints data integrity enforced by sentences (in the same logic)

⇒ the instance is a model of the constraints

What about CREATE VIEW Statements?View declaration ∼ a sentence ∀x.V (x)↔ ϕ

where V is a (new) relational symbol and ϕ is a query (in our logic).⇒ now the instance is no longer a model (unless V is a materialized view)

What about CREATE INDEX Statements?Index declaration (on R for a key x) ∼ a sentence ∀r ,x.I(r ,x)↔ ∃y .R(r ,x,y)

. . . what is then the difference between the primary index on R and R itself?

David Toman (et al.) KR in DBMS Implementation Motivation 2 / 40

Page 4: KR in Database Systems Implementationdavid/cs348/interpolation.pdf)the instance is a model of the constraints What about CREATE VIEW Statements? View declaration ˘a sentence 8x:V(x)

Data and Constraints

The Textbook View of Relational Databases

Data represented as an instance of a relational structureQueries access to data via open formulæ (in an appropriate logic)Constraints data integrity enforced by sentences (in the same logic)

⇒ the instance is a model of the constraints

What about CREATE VIEW Statements?View declaration ∼ a sentence ∀x.V (x)↔ ϕ

where V is a (new) relational symbol and ϕ is a query (in our logic).⇒ now the instance is no longer a model (unless V is a materialized view)

What about CREATE INDEX Statements?Index declaration (on R for a key x) ∼ a sentence ∀r ,x.I(r ,x)↔ ∃y .R(r ,x,y)

. . . what is then the difference between the primary index on R and R itself?

David Toman (et al.) KR in DBMS Implementation Motivation 2 / 40

Page 5: KR in Database Systems Implementationdavid/cs348/interpolation.pdf)the instance is a model of the constraints What about CREATE VIEW Statements? View declaration ˘a sentence 8x:V(x)

Data and Constraints

The Textbook View of Relational Databases

Data represented as an instance of a relational structureQueries access to data via open formulæ (in an appropriate logic)Constraints data integrity enforced by sentences (in the same logic)

⇒ the instance is a model of the constraints

What about CREATE VIEW Statements?View declaration ∼ a sentence ∀x.V (x)↔ ϕ

where V is a (new) relational symbol and ϕ is a query (in our logic).⇒ now the instance is no longer a model (unless V is a materialized view)

What about CREATE INDEX Statements?Index declaration (on R for a key x) ∼ a sentence ∀r ,x.I(r ,x)↔ ∃y .R(r ,x,y)

. . . what is then the difference between the primary index on R and R itself?

David Toman (et al.) KR in DBMS Implementation Motivation 2 / 40

Page 6: KR in Database Systems Implementationdavid/cs348/interpolation.pdf)the instance is a model of the constraints What about CREATE VIEW Statements? View declaration ˘a sentence 8x:V(x)

Data and Constraints

The Textbook View of Relational Databases

Data represented as an instance of a relational structureQueries access to data via open formulæ (in an appropriate logic)Constraints data integrity enforced by sentences (in the same logic)

⇒ the instance is a model of the constraints

What about CREATE VIEW Statements?View declaration ∼ a sentence ∀x.V (x)↔ ϕ

where V is a (new) relational symbol and ϕ is a query (in our logic).⇒ now the instance is no longer a model (unless V is a materialized view)

What about CREATE INDEX Statements?Index declaration (on R for a key x) ∼ a sentence ∀r ,x.I(r ,x)↔ ∃y .R(r ,x,y)

. . . what is then the difference between the primary index on R and R itself?

David Toman (et al.) KR in DBMS Implementation Motivation 2 / 40

Page 7: KR in Database Systems Implementationdavid/cs348/interpolation.pdf)the instance is a model of the constraints What about CREATE VIEW Statements? View declaration ˘a sentence 8x:V(x)

Phyical Data Independence

IDEA:Separate the users’ view(s) of the data fromthe way it is physically represented.

independent customized user views,changes to conceptual structure withoutaffecting users,physical storage details hidden fromusers,changes to physical storage withoutaffecting logical view,

Originally just two levels: physicaland conceptual/logical [Codd1970].

[ANSI/X3/SPARC StandardsPlanning and RequirementsCommittee, Bachman, 1975]

David Toman (et al.) KR in DBMS Implementation Motivation 3 / 40

Page 8: KR in Database Systems Implementationdavid/cs348/interpolation.pdf)the instance is a model of the constraints What about CREATE VIEW Statements? View declaration ˘a sentence 8x:V(x)

Phyical Data Independence

IDEA:Separate the users’ view(s) of the data fromthe way it is physically represented.

independent customized user views,changes to conceptual structure withoutaffecting users,physical storage details hidden fromusers,changes to physical storage withoutaffecting logical view,

Originally just two levels: physicaland conceptual/logical [Codd1970].

[ANSI/X3/SPARC StandardsPlanning and RequirementsCommittee, Bachman, 1975]

David Toman (et al.) KR in DBMS Implementation Motivation 3 / 40

Page 9: KR in Database Systems Implementationdavid/cs348/interpolation.pdf)the instance is a model of the constraints What about CREATE VIEW Statements? View declaration ˘a sentence 8x:V(x)

Phyical Data Independence

IDEA:Separate the users’ view(s) of the data fromthe way it is physically represented.

independent customized user views,changes to conceptual structure withoutaffecting users,physical storage details hidden fromusers,changes to physical storage withoutaffecting logical view,

Originally just two levels: physicaland conceptual/logical [Codd1970].

[ANSI/X3/SPARC StandardsPlanning and RequirementsCommittee, Bachman, 1975]

David Toman (et al.) KR in DBMS Implementation Motivation 3 / 40

Page 10: KR in Database Systems Implementationdavid/cs348/interpolation.pdf)the instance is a model of the constraints What about CREATE VIEW Statements? View declaration ˘a sentence 8x:V(x)

Phyical Data Independence

IDEA:Separate the users’ view(s) of the data fromthe way it is physically represented.

independent customized user views,changes to conceptual structure withoutaffecting users,physical storage details hidden fromusers,changes to physical storage withoutaffecting logical view,

Originally just two levels: physicaland conceptual/logical [Codd1970].

[ANSI/X3/SPARC StandardsPlanning and RequirementsCommittee, Bachman, 1975]

David Toman (et al.) KR in DBMS Implementation Motivation 3 / 40

Page 11: KR in Database Systems Implementationdavid/cs348/interpolation.pdf)the instance is a model of the constraints What about CREATE VIEW Statements? View declaration ˘a sentence 8x:V(x)

Ontology-based Data Access (OBDA) [Calvanese et al.: Mastro, 2011]

Semantic Web 2 (2011) 43–53 43DOI 10.3233/SW-2011-0029IOS Press

The MASTRO system for ontology-based dataaccessEditor(s): Thomas Lukasiewicz, Oxford University, UKSolicited review(s): Carsten Lutz, Universität Bremen, Germany; Roman Kontchakov, Birkbeck College London, UK; one anonymous reviewer

Diego Calvanese a,*, Giuseppe De Giacomo b, Domenico Lembo b, Maurizio Lenzerini b,Antonella Poggi b, Mariano Rodriguez-Muro a, Riccardo Rosati b, Marco Ruzzi b andDomenico Fabio Savo b

a Free University of Bozen-Bolzano, Piazza Domenicani 3, I-39100, Bolzano, ItalyE-mail: [email protected] Sapienza Universita di Roma, Via Ariosto 25, I-00185, Roma, ItalyE-mail: [email protected]

Abstract. In this paper we present MASTRO, a Java tool for ontology-based data access (OBDA) developed at Sapienza Univer-sità di Roma and at the Free University of Bozen-Bolzano. MASTRO manages OBDA systems in which the ontology is specifiedin DL-LiteA,id , a logic of the DL-Lite family of tractable Description Logics specifically tailored to ontology-based data access,and is connected to external JDBC enabled data management systems through semantic mappings that associate SQL queriesover the external data to the elements of the ontology. Advanced forms of integrity constraints, which turned out to be veryuseful in practical applications, are also enabled over the ontologies. Optimized algorithms for answering expressive queriesare provided, as well as features for intensional reasoning and consistency checking. MASTRO provides a proprietary API, anOWLAPI compatible interface, and a plugin for the Protégé 4 ontology editor. It has been successfully used in several projectscarried out in collaboration with important organizations, on which we briefly comment in this paper.

Keywords: Ontology-based data access, Description Logics, reasoning over ontologies

1. Introduction

In this paper we present MASTRO, a tool forontology-based data access developed at SapienzaUniversità di Roma and at the Free University ofBozen-Bolzano. Ontology-based data access (OBDA)refers to a setting in which an ontology is used as ahigh-level, conceptual view over data repositories, al-lowing users to access data without the need to knowhow they are actually organized and where they arestored (cf. Fig. 1).

The OBDA approach turns out to be very useful inall scenarios in which accessing data in a unified andcoherent way is difficult. This may happen for several

*Corresponding author.

reasons. For example, databases may have undergoneseveral manipulations during the years, often for op-timizing applications using them, and may have lost

Fig. 1. Ontology-based data access.

1570-0844/11/$27.50 c! 2011 – IOS Press and the authors. All rights reserved

Information Integration [Genesereth: Data Integration, 2010]

Data Exchange [Arenas et el.: Data Exchange, 2014]

David Toman (et al.) KR in DBMS Implementation Motivation 4 / 40

Page 12: KR in Database Systems Implementationdavid/cs348/interpolation.pdf)the instance is a model of the constraints What about CREATE VIEW Statements? View declaration ˘a sentence 8x:V(x)

Common Threads and Issues

In general two schemas: Conceptual/Logical and Physical

⇒ both endowed with metadata (vocabulary, constraints, . . . )⇒ mappings that connect the two schemas

Rules: (1) queries (updates) over the conceptual/logical schema only,(2) raw data as instance of the physical schema only,(3) conceptual/logical and physical schemata are disjoint.

Issues to be formalized/fixed:1 Formal description of the two schemas (same formalism for both?)2 Language(s) for metadata, mappings, etc.3 User Interface:

Data ModelQuery (and Update) Language (e.g., when an answer is an answer?)

Algorithms/Execution model for user queries and updates

David Toman (et al.) KR in DBMS Implementation Motivation 5 / 40

Page 13: KR in Database Systems Implementationdavid/cs348/interpolation.pdf)the instance is a model of the constraints What about CREATE VIEW Statements? View declaration ˘a sentence 8x:V(x)

Common Threads and Issues

In general two schemas: Conceptual/Logical and Physical

⇒ both endowed with metadata (vocabulary, constraints, . . . )⇒ mappings that connect the two schemas

Rules: (1) queries (updates) over the conceptual/logical schema only,(2) raw data as instance of the physical schema only,(3) conceptual/logical and physical schemata are disjoint.

Issues to be formalized/fixed:1 Formal description of the two schemas (same formalism for both?)2 Language(s) for metadata, mappings, etc.3 User Interface:

Data ModelQuery (and Update) Language (e.g., when an answer is an answer?)

Algorithms/Execution model for user queries and updates

David Toman (et al.) KR in DBMS Implementation Motivation 5 / 40

Page 14: KR in Database Systems Implementationdavid/cs348/interpolation.pdf)the instance is a model of the constraints What about CREATE VIEW Statements? View declaration ˘a sentence 8x:V(x)

Common Threads and Issues

In general two schemas: Conceptual/Logical and Physical

⇒ both endowed with metadata (vocabulary, constraints, . . . )⇒ mappings that connect the two schemas

Rules: (1) queries (updates) over the conceptual/logical schema only,(2) raw data as instance of the physical schema only,(3) conceptual/logical and physical schemata are disjoint.

Issues to be formalized/fixed:1 Formal description of the two schemas (same formalism for both?)2 Language(s) for metadata, mappings, etc.3 User Interface:

Data ModelQuery (and Update) Language (e.g., when an answer is an answer?)

Algorithms/Execution model for user queries and updates

David Toman (et al.) KR in DBMS Implementation Motivation 5 / 40

Page 15: KR in Database Systems Implementationdavid/cs348/interpolation.pdf)the instance is a model of the constraints What about CREATE VIEW Statements? View declaration ˘a sentence 8x:V(x)

What to do?

Definability and RewritingQueries range-restricted FOL (a.k.a. SQL)Ontology/Schema range-restricted FOLData CWA (complete information)

ΣL SL ϕoo (Logical Schema)

Query (SL)��

CompilerRelational Algebra (over SA)

��Schema (SL ∪ SP)

OO

Evaluator // Answers

Data (SA ⊂ SP)

OO

David Toman (et al.) KR in DBMS Implementation Definability/Interpolation 6 / 40

Page 16: KR in Database Systems Implementationdavid/cs348/interpolation.pdf)the instance is a model of the constraints What about CREATE VIEW Statements? View declaration ˘a sentence 8x:V(x)

What to do?

Definability and RewritingQueries range-restricted FOL over SL definable w.r.t. Σ and SAOntology/Schema range-restricted FOLData CWA (complete information for SA symbols)

ΣL SL ϕoo (Logical Schema)

ΣLP (rewriting)

��ΣP SA ⊆ SP ψoo (Physical Schema)

Query (SL)��

CompilerRelational Algebra (over SA)

��Schema (SL ∪ SP)

OO

Evaluator // Answers

Data (SA ⊂ SP)

OO

David Toman (et al.) KR in DBMS Implementation Definability/Interpolation 6 / 40

Page 17: KR in Database Systems Implementationdavid/cs348/interpolation.pdf)the instance is a model of the constraints What about CREATE VIEW Statements? View declaration ˘a sentence 8x:V(x)

What to do?

Definability and RewritingQueries range-restricted FOL over SL definable w.r.t. Σ and SAOntology/Schema range-restricted FOLData CWA (complete information for SA symbols)

Query (SL)��

CompilerRelational Algebra (over SA)

��Schema (SL ∪ SP)

OO

Evaluator // Answers

Data (SA ⊂ SP)

OO

David Toman (et al.) KR in DBMS Implementation Definability/Interpolation 6 / 40

Page 18: KR in Database Systems Implementationdavid/cs348/interpolation.pdf)the instance is a model of the constraints What about CREATE VIEW Statements? View declaration ˘a sentence 8x:V(x)

What to do?

Definability and RewritingQueries range-restricted FOL over SL definable w.r.t. Σ and SAOntology/Schema range-restricted FOLData CWA (complete information for SA symbols)

Query (SL)��

CompilerRelational Algebra (over SA)

��Schema (SL ∪ SP)

OO

Evaluator // Answers

Data (SA ⊂ SP)

OO

Morgan Claypool Publishers&w w w . m o r g a n c l a y p o o l . c o m

Series Editor: M. Tamer Özsu, University of Waterloo

CM& Morgan Claypool Publishers&SYNTHESIS LECTURES ON DATA MANAGEMENT

SYNTHESIS LECTURES ON DATA MANAGEMENT

About SYNTHESIsThis volume is a printed version of a work that appears in the SynthesisDigital Library of Engineering and Computer Science. Synthesis Lecturesprovide concise, original presentations of important research and developmenttopics, published quickly, in digital and print formats. For more informationvisit www.morganclaypool.com

M. Tamer Özsu, Series Editor

MORGAN

&CLAYPO

OL

ISBN: 978-1-60845-278-1

9 781608 452781

90000

Series ISSN: 2153-5418

FUNDAMENTALS OF PHYSICAL DESIGN AND Q

UERY COMPILATION

Fundamentals of Physical Design andQuery Compilation

University of Waterloo

Query compilation is the problem of translating user requests formulated over purely conceptual anddomain specific ways of understanding data, commonly called logical designs, to efficient executableprograms called query plans. Such plans access various concrete data sources through their low-leveloften iterator-based interfaces. An appreciation of the concrete data sources, their interfaces and howsuch capabilities relate to logical design is commonly called a physical design. This book is an introductionto the fundamental methods underlying database technology that solves the problem of querycompilation. The methods are presented in terms of first-order logic which serves as the vehicle forspecifying physical design, expressing user requests and query plans, and understanding how queryplans implement user requests.

Fundamentals ofPhysical Design andQuery Compilation

David Toman

Morgan Claypool Publishers&w w w . m o r g a n c l a y p o o l . c o m

Series Editor: M. Tamer Özsu, University of Waterloo

CM& Morgan Claypool Publishers&SYNTHESIS LECTURES ON DATA MANAGEMENT

SYNTHESIS LECTURES ON DATA MANAGEMENT

About SYNTHESIsThis volume is a printed version of a work that appears in the SynthesisDigital Library of Engineering and Computer Science. Synthesis Lecturesprovide concise, original presentations of important research and developmenttopics, published quickly, in digital and print formats. For more informationvisit www.morganclaypool.com

M. Tamer Özsu, Series Editor

MORGAN

&CLAYPO

OL

ISBN: 978-1-60845-278-1

9 781608 452781

90000

Series ISSN: 2153-5418

FUNDAMENTALS OF PHYSICAL DESIGN AND Q

UERY COMPILATION

Fundamentals of Physical Design andQuery Compilation

University of Waterloo

Query compilation is the problem of translating user requests formulated over purely conceptual anddomain specific ways of understanding data, commonly called logical designs, to efficient executableprograms called query plans. Such plans access various concrete data sources through their low-leveloften iterator-based interfaces. An appreciation of the concrete data sources, their interfaces and howsuch capabilities relate to logical design is commonly called a physical design. This book is an introductionto the fundamental methods underlying database technology that solves the problem of querycompilation. The methods are presented in terms of first-order logic which serves as the vehicle forspecifying physical design, expressing user requests and query plans, and understanding how queryplans implement user requests.

Fundamentals ofPhysical Design andQuery Compilation

David Toman

Morgan Claypool Publishers&w w w . m o r g a n c l a y p o o l . c o m

Series Editor: M. Tamer Özsu, University of Waterloo

CM& Morgan Claypool Publishers&SYNTHESIS LECTURES ON DATA MANAGEMENT

SYNTHESIS LECTURES ON DATA MANAGEMENT

About SYNTHESIsThis volume is a printed version of a work that appears in the SynthesisDigital Library of Engineering and Computer Science. Synthesis Lecturesprovide concise, original presentations of important research and developmenttopics, published quickly, in digital and print formats. For more informationvisit www.morganclaypool.com

M. Tamer Özsu, Series Editor

MORGAN

&CLAYPO

OL

ISBN: 978-1-60845-278-1

9 781608 452781

90000

Series ISSN: 2153-5418

FUNDAMENTALS OF PHYSICAL DESIGN AND Q

UERY COMPILATION

Fundamentals of Physical Design andQuery Compilation

University of Waterloo

Query compilation is the problem of translating user requests formulated over purely conceptual anddomain specific ways of understanding data, commonly called logical designs, to efficient executableprograms called query plans. Such plans access various concrete data sources through their low-leveloften iterator-based interfaces. An appreciation of the concrete data sources, their interfaces and howsuch capabilities relate to logical design is commonly called a physical design. This book is an introductionto the fundamental methods underlying database technology that solves the problem of querycompilation. The methods are presented in terms of first-order logic which serves as the vehicle forspecifying physical design, expressing user requests and query plans, and understanding how queryplans implement user requests.

Fundamentals ofPhysical Design andQuery Compilation

David Toman

©2011

users: looks like a single model (of the conceptual schema)implementation: many models

but definable queries answer the same in each of them

David Toman (et al.) KR in DBMS Implementation Definability/Interpolation 6 / 40

Page 19: KR in Database Systems Implementationdavid/cs348/interpolation.pdf)the instance is a model of the constraints What about CREATE VIEW Statements? View declaration ˘a sentence 8x:V(x)

What to do?

Definability and RewritingQueries range-restricted FOL over SL definable w.r.t. Σ and SAOntology/Schema range-restricted FOLData CWA (complete information for SA symbols)

Query (SL)��

CompilerRelational Algebra (over SA)

��Schema (SL ∪ SP)

OO

Evaluator // Answers

Data (SA ⊂ SP)

OO

Morgan Claypool Publishers&w w w . m o r g a n c l a y p o o l . c o m

Series Editor: M. Tamer Özsu, University of Waterloo

CM& Morgan Claypool Publishers&SYNTHESIS LECTURES ON DATA MANAGEMENT

SYNTHESIS LECTURES ON DATA MANAGEMENT

About SYNTHESIsThis volume is a printed version of a work that appears in the SynthesisDigital Library of Engineering and Computer Science. Synthesis Lecturesprovide concise, original presentations of important research and developmenttopics, published quickly, in digital and print formats. For more informationvisit www.morganclaypool.com

M. Tamer Özsu, Series Editor

MORGAN

&CLAYPO

OL

ISBN: 978-1-60845-278-1

9 781608 452781

90000

Series ISSN: 2153-5418

FUNDAMENTALS OF PHYSICAL DESIGN AND Q

UERY COMPILATION

Fundamentals of Physical Design andQuery Compilation

University of Waterloo

Query compilation is the problem of translating user requests formulated over purely conceptual anddomain specific ways of understanding data, commonly called logical designs, to efficient executableprograms called query plans. Such plans access various concrete data sources through their low-leveloften iterator-based interfaces. An appreciation of the concrete data sources, their interfaces and howsuch capabilities relate to logical design is commonly called a physical design. This book is an introductionto the fundamental methods underlying database technology that solves the problem of querycompilation. The methods are presented in terms of first-order logic which serves as the vehicle forspecifying physical design, expressing user requests and query plans, and understanding how queryplans implement user requests.

Fundamentals ofPhysical Design andQuery Compilation

David Toman

Morgan Claypool Publishers&w w w . m o r g a n c l a y p o o l . c o m

Series Editor: M. Tamer Özsu, University of Waterloo

CM& Morgan Claypool Publishers&SYNTHESIS LECTURES ON DATA MANAGEMENT

SYNTHESIS LECTURES ON DATA MANAGEMENT

About SYNTHESIsThis volume is a printed version of a work that appears in the SynthesisDigital Library of Engineering and Computer Science. Synthesis Lecturesprovide concise, original presentations of important research and developmenttopics, published quickly, in digital and print formats. For more informationvisit www.morganclaypool.com

M. Tamer Özsu, Series Editor

MORGAN

&CLAYPO

OL

ISBN: 978-1-60845-278-1

9 781608 452781

90000

Series ISSN: 2153-5418

FUNDAMENTALS OF PHYSICAL DESIGN AND Q

UERY COMPILATION

Fundamentals of Physical Design andQuery Compilation

University of Waterloo

Query compilation is the problem of translating user requests formulated over purely conceptual anddomain specific ways of understanding data, commonly called logical designs, to efficient executableprograms called query plans. Such plans access various concrete data sources through their low-leveloften iterator-based interfaces. An appreciation of the concrete data sources, their interfaces and howsuch capabilities relate to logical design is commonly called a physical design. This book is an introductionto the fundamental methods underlying database technology that solves the problem of querycompilation. The methods are presented in terms of first-order logic which serves as the vehicle forspecifying physical design, expressing user requests and query plans, and understanding how queryplans implement user requests.

Fundamentals ofPhysical Design andQuery Compilation

David Toman

Morgan Claypool Publishers&w w w . m o r g a n c l a y p o o l . c o m

Series Editor: M. Tamer Özsu, University of Waterloo

CM& Morgan Claypool Publishers&SYNTHESIS LECTURES ON DATA MANAGEMENT

SYNTHESIS LECTURES ON DATA MANAGEMENT

About SYNTHESIsThis volume is a printed version of a work that appears in the SynthesisDigital Library of Engineering and Computer Science. Synthesis Lecturesprovide concise, original presentations of important research and developmenttopics, published quickly, in digital and print formats. For more informationvisit www.morganclaypool.com

M. Tamer Özsu, Series Editor

MORGAN

&CLAYPO

OL

ISBN: 978-1-60845-278-1

9 781608 452781

90000

Series ISSN: 2153-5418

FUNDAMENTALS OF PHYSICAL DESIGN AND Q

UERY COMPILATION

Fundamentals of Physical Design andQuery Compilation

University of Waterloo

Query compilation is the problem of translating user requests formulated over purely conceptual anddomain specific ways of understanding data, commonly called logical designs, to efficient executableprograms called query plans. Such plans access various concrete data sources through their low-leveloften iterator-based interfaces. An appreciation of the concrete data sources, their interfaces and howsuch capabilities relate to logical design is commonly called a physical design. This book is an introductionto the fundamental methods underlying database technology that solves the problem of querycompilation. The methods are presented in terms of first-order logic which serves as the vehicle forspecifying physical design, expressing user requests and query plans, and understanding how queryplans implement user requests.

Fundamentals ofPhysical Design andQuery Compilation

David Toman

©2011

decidable (schema) languages?finite models?

David Toman (et al.) KR in DBMS Implementation Definability/Interpolation 6 / 40

Page 20: KR in Database Systems Implementationdavid/cs348/interpolation.pdf)the instance is a model of the constraints What about CREATE VIEW Statements? View declaration ˘a sentence 8x:V(x)

GRAND UNIFIED APPROACH

TO QUERY COMPILATION

PART I: WHAT CAN IT DO?

David Toman (et al.) KR in DBMS Implementation 7 / 40

Page 21: KR in Database Systems Implementationdavid/cs348/interpolation.pdf)the instance is a model of the constraints What about CREATE VIEW Statements? View declaration ˘a sentence 8x:V(x)

What can this do?

GOALGenerate query plans that compete with hand-written programs in C

1 Standard Designs (base files, primary/secondary indexing,horizontal/vertical partitioning, coverage/disjointness)

2 linked data structures, pointers, . . .3 access to search structures (index access and selection),4 hash-based access to data (including hash-joins),5 multi-level storage (aka disk/remote/distributed files), . . .6 materialized views (FO-definable),7 updates through logical schema (needs id invention!), . . .

. . . all without having to code (too much) in C/C++ !

David Toman (et al.) KR in DBMS Implementation What can it do? 8 / 40

Page 22: KR in Database Systems Implementationdavid/cs348/interpolation.pdf)the instance is a model of the constraints What about CREATE VIEW Statements? View declaration ˘a sentence 8x:V(x)

Basics: Base File

david$ cat 848ex/base3.fol[% constraintstable(x,y,z) <-> ex(r,p0basetable(r,x,y,z)),% queryq(x,y) <-> ex(z,table(x,x,z) and table(z,y,y))].

David Toman (et al.) KR in DBMS Implementation What can it do? 9 / 40

Page 23: KR in Database Systems Implementationdavid/cs348/interpolation.pdf)the instance is a model of the constraints What about CREATE VIEW Statements? View declaration ˘a sentence 8x:V(x)

Basics: Base File (cont.)

david$ cat 848ex/base3.fol | ./trans | ./main -allocated heap 1073741824 (1024M) at 0x116202000

Closing sets:L: { -p0basetable(slc:5,sl8:3,sl0:2,sl0:2)/0 }

{ -p0basetable(slc:4,sl0:1,sl0:1,sl8:3)/0 }R: { +p0basetable(slc:4,sl0:1,sl0:1,sl8:3)/0

+p0basetable(slc:5,sl8:3,sl0:2,sl0:2)/0 }

Success with fragment:+EXISTS[sl8:3.E](

+JOIN(+EXISTS[slc:4.E](+ATOM p0basetable(slc:4,sl0:1,sl0:1,sl8:3)),+EXISTS[slc:5.E](

+ATOM p0basetable(slc:5,sl8:3,sl0:2,sl0:2))))

David Toman (et al.) KR in DBMS Implementation What can it do? 10 / 40

Page 24: KR in Database Systems Implementationdavid/cs348/interpolation.pdf)the instance is a model of the constraints What about CREATE VIEW Statements? View declaration ˘a sentence 8x:V(x)

Basics: Vertical Partition

david$ cat 848ex/indexonly.fol[% constraintstable(x,y,z) <-> ex(r,basetable(r,x,y,z)),% indicesp0indexx(r,x) <-> ex([y,z],basetable(r,x,y,z)),p0indexy(r,y) <-> ex([x,z],basetable(r,x,y,z)),p0indexz(r,z) <-> ex([x,y],basetable(r,x,y,z)),% keys(basetable(r,x1,y1,z1) and basetable(r,x2,y2,z2))

->(x1=x2 and y1=y2 and z1=z2),% queryq(x,y,z) <-> table(x,y,z)].

David Toman (et al.) KR in DBMS Implementation What can it do? 11 / 40

Page 25: KR in Database Systems Implementationdavid/cs348/interpolation.pdf)the instance is a model of the constraints What about CREATE VIEW Statements? View declaration ˘a sentence 8x:V(x)

Basics: Vertical Partition

david$ cat 848ex/indexonly.fol | ./trans | ./main -allocated heap 1073741824 (1024M) at 0x11304f000

Closing sets:L: { -p0indexx(sl19:4,sl0:1)/0 }

{ -p0indexy(sl19:4,sl0:2)/0 }{ -p0indexz(sl19:4,sl0:3)/0 }

R: { +p0indexz(sl19:4,sl0:3)/0+p0indexy(sl19:4,sl0:2)/0+p0indexx(sl19:4,sl0:1)/0 }

Success with fragment:+EXISTS[sl19:4.E](

+JOIN(+ATOM p0indexx(sl19:4,sl0:1),+JOIN(

+ATOM p0indexy(sl19:4,sl0:2),+ATOM p0indexz(sl19:4,sl0:3))))

David Toman (et al.) KR in DBMS Implementation What can it do? 12 / 40

Page 26: KR in Database Systems Implementationdavid/cs348/interpolation.pdf)the instance is a model of the constraints What about CREATE VIEW Statements? View declaration ˘a sentence 8x:V(x)

Basics: Vertical Partition (and Projection)

% queryq(x,y) <-> ex(z,table(x,y,z))].

david$ cat 848ex/indexonly2.fol | ./trans | ./main -allocated heap 1073741824 (1024M) at 0x114117000

Closing sets:L: { -p0indexx(sl1c:4,sl0:1)/0 }

{ -p0indexy(sl1c:4,sl0:2)/0 }{ -p0indexz(sl1c:4,sl11:3)/0 }

R: { +p0indexy(sl1c:4,sl0:2)/0 +p0indexx(sl1c:4,sl0:1)/0 }

Success with fragment:+EXISTS[sl1c:4.E](

+JOIN(+ATOM p0indexx(sl1c:4,sl0:1),+ATOM p0indexy(sl1c:4,sl0:2)))

David Toman (et al.) KR in DBMS Implementation What can it do? 13 / 40

Page 27: KR in Database Systems Implementationdavid/cs348/interpolation.pdf)the instance is a model of the constraints What about CREATE VIEW Statements? View declaration ˘a sentence 8x:V(x)

Basics: Horizontal Partition

david$ cat 848ex/horizontal.fol[% constraintstable(x,y,z) <-> ex(r,basetable(r,x,y,z)),% horizontal partitionsp0hpp1(r,x,y,z) -> basetable(r,x,y,z),p0hpp2(r,x,y,z) -> basetable(r,x,y,z),p0hpp3(r,x,y,z) -> basetable(r,x,y,z),basetable(r,x,y,z) -> (p0hpp1(r,x,y,z) or

p0hpp2(r,x,y,z) orp0hpp3(r,x,y,z)),

% keys(basetable(r,x1,y1,z1) and basetable(r,x2,y2,z2))

->(x1=x2 and y1=y2 and z1=z2),% queryq(x,y,z) <-> table(x,y,z)].

David Toman (et al.) KR in DBMS Implementation What can it do? 14 / 40

Page 28: KR in Database Systems Implementationdavid/cs348/interpolation.pdf)the instance is a model of the constraints What about CREATE VIEW Statements? View declaration ˘a sentence 8x:V(x)

Basics: Coverage

david$ cat 848ex/horizontal.fol | ./trans | ./main -allocated heap 1073741824 (1024M) at 0x116d93000

Closing sets:L: { -p0hpp3(sl13:4,sl0:1,sl0:2,sl0:3)/0

-p0hpp2(sl13:4,sl0:1,sl0:2,sl0:3)/0-p0hpp1(sl13:4,sl0:1,sl0:2,sl0:3)/0 }

R: { +p0hpp1(sl13:4,sl0:1,sl0:2,sl0:3)/0 }{ +p0hpp3(sl13:4,sl0:1,sl0:2,sl0:3)/0 }{ +p0hpp2(sl13:4,sl0:1,sl0:2,sl0:3)/0 }

Success with fragment:+UNION(+UNION(+EXISTS[sl13:4.E](+ATOM p0hpp2(sl13:4,sl0:1,sl0:2,sl0:3)),+EXISTS[sl13:4.E](+ATOM p0hpp3(sl13:4,sl0:1,sl0:2,sl0:3))),+EXISTS[sl13:4.E](+ATOM p0hpp1(sl13:4,sl0:1,sl0:2,sl0:3)))

David Toman (et al.) KR in DBMS Implementation What can it do? 15 / 40

Page 29: KR in Database Systems Implementationdavid/cs348/interpolation.pdf)the instance is a model of the constraints What about CREATE VIEW Statements? View declaration ˘a sentence 8x:V(x)

Basics: Coverage and Disjointnss

david$ cat 848ex/subclass2.fol[% constraintstable(x,y,z) <-> ex(r,basetable(r,x,y,z)),% superclass and complementp0super(r,x,y,z) -> (p0complement(r) or basetable(r,x,y,z)),basetable(r,x,y,z) -> p0super(r,x,y,z),p0complement(r) -> ex([x,y,z],p0super(r,x,y,z)),% disjointnessp0complement(r) and ex([x,y,z],basetable(r,x,y,z)) -> bot,% keys(p0super(r,x1,y1,z1) and p0super(r,x2,y2,z2))->

(x1=x2 and y1=y2 and z1=z2),% queryq(x,y,z) <-> table(x,y,z)].

David Toman (et al.) KR in DBMS Implementation What can it do? 16 / 40

Page 30: KR in Database Systems Implementationdavid/cs348/interpolation.pdf)the instance is a model of the constraints What about CREATE VIEW Statements? View declaration ˘a sentence 8x:V(x)

Basics: Coverage (cont.)

david$ cat 848ex/subclass2.fol | ./trans | ./main -allocated heap 1073741824 (1024M) at 0x10d09b000

Closing sets:L: { -p0super(sl12:4,sl0:1,sl0:2,sl0:3)/0 }

{ +p0complement(sl12:4)/0 }R: { -p0complement(sl12:4)/0

+p0super(sl12:4,sl0:1,sl0:2,sl0:3)/0 }

Success with fragment:+EXISTS[sl12:4.E](+JOIN(

+ATOM p0super(sl12:4,sl0:1,sl0:2,sl0:3),-ATOM p0complement(sl12:4)))

David Toman (et al.) KR in DBMS Implementation What can it do? 17 / 40

Page 31: KR in Database Systems Implementationdavid/cs348/interpolation.pdf)the instance is a model of the constraints What about CREATE VIEW Statements? View declaration ˘a sentence 8x:V(x)

Lists and Pointers

1 Logical Schemaemployee works department

number oo // enumber number//

name dnumber name

salary manager

oo

2 Physical Design: a linked list of emp records pointing to dept records.record emp of

integer numstring nameinteger salaryreference dept

record dept ofinteger numstring namereference manager

3 Access Paths: empfile/1/0, emp-num/2/1, . . . (but no deptfile)4 Integrity Constraints (many), e.g.,

∀x , y , z.employee(x , y , z)→ ∃w .empfile(w) ∧ emp-num(w , x),∀a, x .empfile(a) ∧ emp-num(a, x)→ ∃y , z.employee(x , y , z), . . .

David Toman (et al.) KR in DBMS Implementation What can it do? 18 / 40

Page 32: KR in Database Systems Implementationdavid/cs348/interpolation.pdf)the instance is a model of the constraints What about CREATE VIEW Statements? View declaration ˘a sentence 8x:V(x)

What can this do: navigating pointers

1 List all employee numbers and names (∃z.employee(x , y , z)):

∃a.empfile(a) ∧ emp-num(a, x) ∧ emp-name(a, y)

2 List all department numbers with their manager names(∃z,u, v .department(x , z,u) ∧ employee(u, y , v)):

.empfile(a) ∧ emp-dept(a,d)∧ dept-num(d , x) ∧ dept-mgr(d ,e) ∧ emp-name(e, y)

⇒ needs “departments have at least one employee”.. . . needs duplicate elimination during projection.

.empfile(a) ∧ emp-name(a, y) ∧ emp-dept(a,d)∧ dept-num(d , x) ∧ dept-mgr(d ,b) ∧ compare(a,b)

⇒ needs “managers work in their own departments”.. . . NO duplicate elimination during projection.

David Toman (et al.) KR in DBMS Implementation What can it do? 19 / 40

Page 33: KR in Database Systems Implementationdavid/cs348/interpolation.pdf)the instance is a model of the constraints What about CREATE VIEW Statements? View declaration ˘a sentence 8x:V(x)

What can this do: navigating pointers

1 List all employee numbers and names (∃z.employee(x , y , z)):

∃a.empfile(a) ∧ emp-num(a, x) ∧ emp-name(a, y)

or, in C-like syntax: for a in empfile dox := a->num;y := a->name;

2 List all department numbers with their manager names(∃z,u, v .department(x , z,u) ∧ employee(u, y , v)):

.empfile(a) ∧ emp-dept(a,d)∧ dept-num(d , x) ∧ dept-mgr(d ,e) ∧ emp-name(e, y)

⇒ needs “departments have at least one employee”.. . . needs duplicate elimination during projection.

.empfile(a) ∧ emp-name(a, y) ∧ emp-dept(a,d)∧ dept-num(d , x) ∧ dept-mgr(d ,b) ∧ compare(a,b)

⇒ needs “managers work in their own departments”.. . . NO duplicate elimination during projection.

David Toman (et al.) KR in DBMS Implementation What can it do? 19 / 40

Page 34: KR in Database Systems Implementationdavid/cs348/interpolation.pdf)the instance is a model of the constraints What about CREATE VIEW Statements? View declaration ˘a sentence 8x:V(x)

What can this do: navigating pointers

1 List all employee numbers and names (∃z.employee(x , y , z)):

∃a.empfile(a) ∧ emp-num(a, x) ∧ emp-name(a, y)

2 List all department numbers with their manager names(∃z,u, v .department(x , z,u) ∧ employee(u, y , v)):

.empfile(a) ∧ emp-dept(a,d)∧ dept-num(d , x) ∧ dept-mgr(d ,e) ∧ emp-name(e, y)

⇒ needs “departments have at least one employee”.. . . needs duplicate elimination during projection.

.empfile(a) ∧ emp-name(a, y) ∧ emp-dept(a,d)∧ dept-num(d , x) ∧ dept-mgr(d ,b) ∧ compare(a,b)

⇒ needs “managers work in their own departments”.. . . NO duplicate elimination during projection.

David Toman (et al.) KR in DBMS Implementation What can it do? 19 / 40

Page 35: KR in Database Systems Implementationdavid/cs348/interpolation.pdf)the instance is a model of the constraints What about CREATE VIEW Statements? View declaration ˘a sentence 8x:V(x)

What can this do: navigating pointers

1 List all employee numbers and names (∃z.employee(x , y , z)):

∃a.empfile(a) ∧ emp-num(a, x) ∧ emp-name(a, y)

2 List all department numbers with their manager names(∃z,u, v .department(x , z,u) ∧ employee(u, y , v)):

∃a,d ,e.empfile(a) ∧ emp-dept(a,d)∧ dept-num(d , x) ∧ dept-mgr(d ,e) ∧ emp-name(e, y)

⇒ needs “departments have at least one employee”.. . . needs duplicate elimination during projection.

.empfile(a) ∧ emp-name(a, y) ∧ emp-dept(a,d)∧ dept-num(d , x) ∧ dept-mgr(d ,b) ∧ compare(a,b)

⇒ needs “managers work in their own departments”.. . . NO duplicate elimination during projection.

David Toman (et al.) KR in DBMS Implementation What can it do? 19 / 40

Page 36: KR in Database Systems Implementationdavid/cs348/interpolation.pdf)the instance is a model of the constraints What about CREATE VIEW Statements? View declaration ˘a sentence 8x:V(x)

What can this do: navigating pointers

1 List all employee numbers and names (∃z.employee(x , y , z)):

∃a.empfile(a) ∧ emp-num(a, x) ∧ emp-name(a, y)

2 List all department numbers with their manager names(∃z,u, v .department(x , z,u) ∧ employee(u, y , v)):

∃a,d ,e.empfile(a) ∧ emp-dept(a,d)∧ dept-num(d , x) ∧ dept-mgr(d ,e) ∧ emp-name(e, y)

⇒ needs “departments have at least one employee”.. . . needs duplicate elimination during projection.

∃a,b,d .empfile(a) ∧ emp-name(a, y) ∧ emp-dept(a,d)∧ dept-num(d , x) ∧ dept-mgr(d ,b) ∧ compare(a,b)

⇒ needs “managers work in their own departments”.. . . NO duplicate elimination during projection.

David Toman (et al.) KR in DBMS Implementation What can it do? 19 / 40

Page 37: KR in Database Systems Implementationdavid/cs348/interpolation.pdf)the instance is a model of the constraints What about CREATE VIEW Statements? View declaration ˘a sentence 8x:V(x)

What can this do: navigating pointers

1 List all employee numbers and names (∃z.employee(x , y , z)):

∃a.empfile(a) ∧ emp-num(a, x) ∧ emp-name(a, y)

2 List all department numbers with their manager names(∃z,u, v .department(x , z,u) ∧ employee(u, y , v)):

∃a,d ,e.empfile(a) ∧ emp-dept(a,d)∧ dept-num(d , x) ∧ dept-mgr(d ,e) ∧ emp-name(e, y)

⇒ needs “departments have at least one employee”.. . . needs duplicate elimination during projection.

∃a,b,d .empfile(a) ∧ emp-name(a, y) ∧ emp-dept(a,d)∧ dept-num(d , x) ∧ dept-mgr(d ,b) ∧ compare(a,b)

⇒ needs “managers work in their own departments”.. . . NO duplicate elimination during projection.

David Toman (et al.) KR in DBMS Implementation What can it do? 19 / 40

Page 38: KR in Database Systems Implementationdavid/cs348/interpolation.pdf)the instance is a model of the constraints What about CREATE VIEW Statements? View declaration ˘a sentence 8x:V(x)

What can it do: Hashing, Lists, et al.

Hash Index with (list-based) Separate Chaining

... D1

i : • // •

//

• // •

//

... D3

j : ⊥

...

n : • // •

//

⊥ D2

Hash Array Separate Chaining Linked Lists Dept Records

David Toman (et al.) KR in DBMS Implementation What can it do? 20 / 40

Page 39: KR in Database Systems Implementationdavid/cs348/interpolation.pdf)the instance is a model of the constraints What about CREATE VIEW Statements? View declaration ˘a sentence 8x:V(x)

What can it do: Hashing, Lists, et al.

Hash Index on department’s name:Access paths:

SA ⊇ {hash/2/1,hasharraylookup/2/1,listscan/2/1}.Physical Constraints:

ΣLP ⊇ {∀x , y .((deptfile(x) ∧ dept-name(x , y))→ ∃z,w .(hash(y , z)∧ hasharraylookup(z,w) ∧ listscan(w , x))),

∀x , y .(hash(x , y)→ ∃z.hasharraylookup(y , z)),∀x , y .(listscan(x , y)→ deptfile(y)) }

Query:∃y , z.(department(x1,p, y) ∧ employee(y , x2, z)){p}.

E?x6.Hash(p,?x6)^E?x5.Hasharraylookup(?x6,?x5)^E?x4.Listscan(?x5,?x4)^E?s0.Dept-name(?x4,?s0)^Cmp(p,?s0))^Dept-num(?x4,x1)^E?x3.Dept-manager(?x4,?x3)^Emp-name(?x3,x2)

David Toman (et al.) KR in DBMS Implementation What can it do? 21 / 40

Page 40: KR in Database Systems Implementationdavid/cs348/interpolation.pdf)the instance is a model of the constraints What about CREATE VIEW Statements? View declaration ˘a sentence 8x:V(x)

What can it do: Hashing, Lists, et al.

Hash Index on department’s name:Access paths:

SA ⊇ {hash/2/1,hasharraylookup/2/1,listscan/2/1}.Physical Constraints:

ΣLP ⊇ {∀x , y .((deptfile(x) ∧ dept-name(x , y))→ ∃z,w .(hash(y , z)∧ hasharraylookup(z,w) ∧ listscan(w , x))),

∀x , y .(hash(x , y)→ ∃z.hasharraylookup(y , z)),∀x , y .(listscan(x , y)→ deptfile(y)) }

Query:∃y , z.(department(x1,p, y) ∧ employee(y , x2, z)){p}.

E?x6.Hash(p,?x6)^E?x5.Hasharraylookup(?x6,?x5)^E?x4.Listscan(?x5,?x4)^E?s0.Dept-name(?x4,?s0)^Cmp(p,?s0))^Dept-num(?x4,x1)^E?x3.Dept-manager(?x4,?x3)^Emp-name(?x3,x2)

David Toman (et al.) KR in DBMS Implementation What can it do? 21 / 40

Page 41: KR in Database Systems Implementationdavid/cs348/interpolation.pdf)the instance is a model of the constraints What about CREATE VIEW Statements? View declaration ˘a sentence 8x:V(x)

What can this do: two-level store

The access path empfile is refined by emppages/1/0 and emprecords/2/1:

emppages returns (sequentially) disk pages containing emp records, andemprecords given a disc page, returns emp records in that page.

5 List all employees with the same name(∃z,u, v ,w , t .employee(x1, z,u, v) ∧ employee(x2, z,w , t)):

∃y , z,w , v ,p,q.emppages(p) ∧ emppages(q)∧ emprecords(p, y) ∧ emp-num(y , x1) ∧ emp-name(y ,w)∧ emprecords(q, z) ∧ emp-num(z, x2) ∧ emp-name(z, v)

∧ compare(w , v).

⇒ this plan implements the block nested loops join algorithm.

. . . more examples inMorgan Claypool Publishers&w w w . m o r g a n c l a y p o o l . c o m

Series Editor: M. Tamer Özsu, University of Waterloo

CM& Morgan Claypool Publishers&SYNTHESIS LECTURES ON DATA MANAGEMENT

SYNTHESIS LECTURES ON DATA MANAGEMENT

About SYNTHESIsThis volume is a printed version of a work that appears in the SynthesisDigital Library of Engineering and Computer Science. Synthesis Lecturesprovide concise, original presentations of important research and developmenttopics, published quickly, in digital and print formats. For more informationvisit www.morganclaypool.com

M. Tamer Özsu, Series Editor

MORGAN

&CLAYPO

OL

ISBN: 978-1-60845-278-1

9 781608 452781

90000

Series ISSN: 2153-5418

FUNDAMENTALS OF PHYSICAL DESIGN AND Q

UERY COMPILATION

Fundamentals of Physical Design andQuery Compilation

University of Waterloo

Query compilation is the problem of translating user requests formulated over purely conceptual anddomain specific ways of understanding data, commonly called logical designs, to efficient executableprograms called query plans. Such plans access various concrete data sources through their low-leveloften iterator-based interfaces. An appreciation of the concrete data sources, their interfaces and howsuch capabilities relate to logical design is commonly called a physical design. This book is an introductionto the fundamental methods underlying database technology that solves the problem of querycompilation. The methods are presented in terms of first-order logic which serves as the vehicle forspecifying physical design, expressing user requests and query plans, and understanding how queryplans implement user requests.

Fundamentals ofPhysical Design andQuery Compilation

David Toman

Morgan Claypool Publishers&w w w . m o r g a n c l a y p o o l . c o m

Series Editor: M. Tamer Özsu, University of Waterloo

CM& Morgan Claypool Publishers&SYNTHESIS LECTURES ON DATA MANAGEMENT

SYNTHESIS LECTURES ON DATA MANAGEMENT

About SYNTHESIsThis volume is a printed version of a work that appears in the SynthesisDigital Library of Engineering and Computer Science. Synthesis Lecturesprovide concise, original presentations of important research and developmenttopics, published quickly, in digital and print formats. For more informationvisit www.morganclaypool.com

M. Tamer Özsu, Series Editor

MORGAN

&CLAYPO

OL

ISBN: 978-1-60845-278-1

9 781608 452781

90000

Series ISSN: 2153-5418

FUNDAMENTALS OF PHYSICAL DESIGN AND Q

UERY COMPILATION

Fundamentals of Physical Design andQuery Compilation

University of Waterloo

Query compilation is the problem of translating user requests formulated over purely conceptual anddomain specific ways of understanding data, commonly called logical designs, to efficient executableprograms called query plans. Such plans access various concrete data sources through their low-leveloften iterator-based interfaces. An appreciation of the concrete data sources, their interfaces and howsuch capabilities relate to logical design is commonly called a physical design. This book is an introductionto the fundamental methods underlying database technology that solves the problem of querycompilation. The methods are presented in terms of first-order logic which serves as the vehicle forspecifying physical design, expressing user requests and query plans, and understanding how queryplans implement user requests.

Fundamentals ofPhysical Design andQuery Compilation

David Toman

Morgan Claypool Publishers&w w w . m o r g a n c l a y p o o l . c o m

Series Editor: M. Tamer Özsu, University of Waterloo

CM& Morgan Claypool Publishers&SYNTHESIS LECTURES ON DATA MANAGEMENT

SYNTHESIS LECTURES ON DATA MANAGEMENT

About SYNTHESIsThis volume is a printed version of a work that appears in the SynthesisDigital Library of Engineering and Computer Science. Synthesis Lecturesprovide concise, original presentations of important research and developmenttopics, published quickly, in digital and print formats. For more informationvisit www.morganclaypool.com

M. Tamer Özsu, Series Editor

MORGAN

&CLAYPO

OL

ISBN: 978-1-60845-278-1

9 781608 452781

90000

Series ISSN: 2153-5418

FUNDAMENTALS OF PHYSICAL DESIGN AND Q

UERY COMPILATION

Fundamentals of Physical Design andQuery Compilation

University of Waterloo

Query compilation is the problem of translating user requests formulated over purely conceptual anddomain specific ways of understanding data, commonly called logical designs, to efficient executableprograms called query plans. Such plans access various concrete data sources through their low-leveloften iterator-based interfaces. An appreciation of the concrete data sources, their interfaces and howsuch capabilities relate to logical design is commonly called a physical design. This book is an introductionto the fundamental methods underlying database technology that solves the problem of querycompilation. The methods are presented in terms of first-order logic which serves as the vehicle forspecifying physical design, expressing user requests and query plans, and understanding how queryplans implement user requests.

Fundamentals ofPhysical Design andQuery Compilation

David Toman

.David Toman (et al.) KR in DBMS Implementation What can it do? 22 / 40

Page 42: KR in Database Systems Implementationdavid/cs348/interpolation.pdf)the instance is a model of the constraints What about CREATE VIEW Statements? View declaration ˘a sentence 8x:V(x)

What can this do: materialized views

Brain TeaserGiven materialized views

1 V1(x , y)↔ ∃z,u.R(z, x) ∧ R(z,u) ∧ R(u, y)

2 V2(x , y)↔ ∃z.R(x , z) ∧ R(z, y)

3 V3(x , y)↔ ∃z,u.R(x , z) ∧ R(z,u) ∧ R(u, y)

can one rewrite the query

{x , y | ∃z,u, v .R(z, x) ∧ R(z,u) ∧ R(u, v) ∧ R(v , y)}

in terms of V1, V2, ad V3?

David Toman (et al.) KR in DBMS Implementation What can it do? 23 / 40

Page 43: KR in Database Systems Implementationdavid/cs348/interpolation.pdf)the instance is a model of the constraints What about CREATE VIEW Statements? View declaration ˘a sentence 8x:V(x)

What can this do: updates

User updates only through logical schema:⇒ supplying “delta” relations (sets of tuples)

Two copies of the schema: Σold and Σnew ;Delta relations: R+ (insertions) and R− (deletions);Constraints: ∀x .(Rold (x) ∨ R+(x)) ≡ (Rnew (x) ∨ R−(x)),

∀x .(R+(x) ∧ R−(x))→ ⊥

ΣoldL Sold

L SnewL

//U+,U−

ΣnewL

ΣoldLP Σnew

LP

��Σold

P SA ⊆ SoldP SA ⊆ Snew

P//

A+,A−Σnew

P

Update turned into definability question

Is Anew (or A+,A−) definable in terms of Aoldi ∈ Sold

A (old access paths)and U+

j , U−j (user updates) for every access path A ∈ SA?

How do we get “anonymous values”?Constant Complement tables (access paths that do malloc and such);

cyclic dependencies between anonymous values broken via reification.

David Toman (et al.) KR in DBMS Implementation What can it do? 24 / 40

Page 44: KR in Database Systems Implementationdavid/cs348/interpolation.pdf)the instance is a model of the constraints What about CREATE VIEW Statements? View declaration ˘a sentence 8x:V(x)

What can this do: updates

User updates only through logical schema:⇒ supplying “delta” relations (sets of tuples)

Two copies of the schema: Σold and Σnew ;Delta relations: R+ (insertions) and R− (deletions);Constraints: ∀x .(Rold (x) ∨ R+(x)) ≡ (Rnew (x) ∨ R−(x)),

∀x .(R+(x) ∧ R−(x))→ ⊥

Update turned into definability question

Is Anew (or A+,A−) definable in terms of Aoldi ∈ Sold

A (old access paths)and U+

j , U−j (user updates) for every access path A ∈ SA?

How do we get “anonymous values”?Constant Complement tables (access paths that do malloc and such);

cyclic dependencies between anonymous values broken via reification.

David Toman (et al.) KR in DBMS Implementation What can it do? 24 / 40

Page 45: KR in Database Systems Implementationdavid/cs348/interpolation.pdf)the instance is a model of the constraints What about CREATE VIEW Statements? View declaration ˘a sentence 8x:V(x)

What can this do: updates

User updates only through logical schema:⇒ supplying “delta” relations (sets of tuples)

Two copies of the schema: Σold and Σnew ;Delta relations: R+ (insertions) and R− (deletions);Constraints: ∀x .(Rold (x) ∨ R+(x)) ≡ (Rnew (x) ∨ R−(x)),

∀x .(R+(x) ∧ R−(x))→ ⊥

Update turned into definability question

Is Anew (or A+,A−) definable in terms of Aoldi ∈ Sold

A (old access paths)and U+

j , U−j (user updates) for every access path A ∈ SA?

How do we get “anonymous values”?Constant Complement tables (access paths that do malloc and such);

cyclic dependencies between anonymous values broken via reification.

David Toman (et al.) KR in DBMS Implementation What can it do? 24 / 40

Page 46: KR in Database Systems Implementationdavid/cs348/interpolation.pdf)the instance is a model of the constraints What about CREATE VIEW Statements? View declaration ˘a sentence 8x:V(x)

GRAND UNIFIED APPROACH

TO QUERY COMPILATION

PART II: HOW DOES IT WORK?

David Toman (et al.) KR in DBMS Implementation 25 / 40

Page 47: KR in Database Systems Implementationdavid/cs348/interpolation.pdf)the instance is a model of the constraints What about CREATE VIEW Statements? View declaration ˘a sentence 8x:V(x)

The Plan

Definability and RewritingQueries range-restricted FOL over SL definable w.r.t. Σ and SAOntology/Schema range-restricted FOLData CWA (complete information for SA symbols)

ΣL SL ϕoo (Logical Schema)

ΣLP (rewriting)

��ΣP SA ⊆ SP ψoo (Physical Schema)

David Toman (et al.) KR in DBMS Implementation How does it work? 26 / 40

Page 48: KR in Database Systems Implementationdavid/cs348/interpolation.pdf)the instance is a model of the constraints What about CREATE VIEW Statements? View declaration ˘a sentence 8x:V(x)

Query Plans via Interpolation

Idea #1: Plans as FormulasRepresent query plans as (annotated) range-restricted formulas ψ over SA:

atomic formula 7→ access path (get-first–get-next iterator)conjunction 7→ nested loops joinexistential quantifier 7→ projection (annotated w/duplicate info)disjunction 7→ concatenationnegation 7→ simple complement

Non-logical (but necessary) Add-ons1 Non-logical properties/operators

binding patternsduplication of data and duplicate-preserving/eliminating projectionssortedness of data (with respect to the iterator semantics) and sorting

2 Cost model(s)

David Toman (et al.) KR in DBMS Implementation How does it work? 27 / 40

Page 49: KR in Database Systems Implementationdavid/cs348/interpolation.pdf)the instance is a model of the constraints What about CREATE VIEW Statements? View declaration ˘a sentence 8x:V(x)

Query Plans via Interpolation

Idea #1: Plans as FormulasRepresent query plans as (annotated) range-restricted formulas ψ over SA:

atomic formula 7→ access path (get-first–get-next iterator)conjunction 7→ nested loops joinexistential quantifier 7→ projection (annotated w/duplicate info)disjunction 7→ concatenationnegation 7→ simple complement

Non-logical (but necessary) Add-ons1 Non-logical properties/operators

binding patternsduplication of data and duplicate-preserving/eliminating projectionssortedness of data (with respect to the iterator semantics) and sorting

2 Cost model(s)

David Toman (et al.) KR in DBMS Implementation How does it work? 27 / 40

Page 50: KR in Database Systems Implementationdavid/cs348/interpolation.pdf)the instance is a model of the constraints What about CREATE VIEW Statements? View declaration ˘a sentence 8x:V(x)

Query Plans via Interpolation

Idea #1 (cont):Represent physical design as access paths (SA) and constraints (Σ).Represent query plans as (annotated) range-restricted formulas ψ over SA.

⇒ reduces correctness of ψ to logical implication Σ |= ϕ↔ ψ

IDEA #2:Use interpolation to search for ψ:

extract an interpolant ψ from a (TABLEAU) proof of Σ ∪ Σ∗ |= ϕ→ ϕ∗

⇒ Beth definability of ϕ over Σ and SA resolves the existence of ψ.

David Toman (et al.) KR in DBMS Implementation How does it work? 28 / 40

Page 51: KR in Database Systems Implementationdavid/cs348/interpolation.pdf)the instance is a model of the constraints What about CREATE VIEW Statements? View declaration ˘a sentence 8x:V(x)

Query Plans via Interpolation

Idea #1 (cont):Represent physical design as access paths (SA) and constraints (Σ).Represent query plans as (annotated) range-restricted formulas ψ over SA.

⇒ reduces correctness of ψ to logical implication Σ |= ϕ↔ ψ

IDEA #2:Use interpolation to search for ψ:

extract an interpolant ψ from a (TABLEAU) proof of Σ ∪ Σ∗ |= ϕ→ ϕ∗

⇒ Beth definability of ϕ over Σ and SA resolves the existence of ψ.

David Toman (et al.) KR in DBMS Implementation How does it work? 28 / 40

Page 52: KR in Database Systems Implementationdavid/cs348/interpolation.pdf)the instance is a model of the constraints What about CREATE VIEW Statements? View declaration ˘a sentence 8x:V(x)

Query Plans via Interpolation

Idea #1 (cont):Represent physical design as access paths (SA) and constraints (Σ).Represent query plans as (annotated) range-restricted formulas ψ over SA.

⇒ reduces correctness of ψ to logical implication Σ |= ϕ↔ ψ

IDEA #2:Use interpolation to search for ψ:

extract an interpolant ψ from a (TABLEAU) proof of Σ ∪ Σ∗ |= ϕ→ ϕ∗

⇒ Beth definability of ϕ over Σ and SA resolves the existence of ψ.

David Toman (et al.) KR in DBMS Implementation How does it work? 28 / 40

Page 53: KR in Database Systems Implementationdavid/cs348/interpolation.pdf)the instance is a model of the constraints What about CREATE VIEW Statements? View declaration ˘a sentence 8x:V(x)

Issues with TABLEAU

Dealing with the subformula property of Tableau⇒ analytic tableau explores formulas structurally⇒ (to large degree ) the structure of interpolant

depends on where access paths are present in queries/constraints.

IDEA #3:Separate general constraints from physical rules in the formulation ofthe definability question (and the subsequent interpolant extraction):

ΣL ∪ ΣR ∪ ΣLR |= ϕL → ϕR where ΣLR = {∀x .PL ↔ P ↔ PR | P ∈ SA}

Factoring logical reasoning from plan enumeration⇒ backtracking tableau to get alternative plans: too slow, too few plans

IDEA #4:Define conditional tableau exploration (using general constraints)

and separate it from plan generation (using physical rules)

David Toman (et al.) KR in DBMS Implementation How does it work? 29 / 40

Page 54: KR in Database Systems Implementationdavid/cs348/interpolation.pdf)the instance is a model of the constraints What about CREATE VIEW Statements? View declaration ˘a sentence 8x:V(x)

Issues with TABLEAU

Dealing with the subformula property of Tableau⇒ analytic tableau explores formulas structurally⇒ (to large degree ) the structure of interpolant

depends on where access paths are present in queries/constraints.

IDEA #3:Separate general constraints from physical rules in the formulation ofthe definability question (and the subsequent interpolant extraction):

ΣL ∪ ΣR ∪ ΣLR |= ϕL → ϕR where ΣLR = {∀x .PL ↔ P ↔ PR | P ∈ SA}

Factoring logical reasoning from plan enumeration⇒ backtracking tableau to get alternative plans: too slow, too few plans

IDEA #4:Define conditional tableau exploration (using general constraints)

and separate it from plan generation (using physical rules)

David Toman (et al.) KR in DBMS Implementation How does it work? 29 / 40

Page 55: KR in Database Systems Implementationdavid/cs348/interpolation.pdf)the instance is a model of the constraints What about CREATE VIEW Statements? View declaration ˘a sentence 8x:V(x)

Conditional Formulæ and Tableau

Conditional Formulæϕ[C] where C is a set of (ground) atoms over SA

ϕ only exists if all atoms in C are “used” in a plan tableau.

Absorbed Range-restricted Formulæ: ANF

Q ::= R(x) | ⊥ | Q ∧Q | Q ∨Q | ∀x .R(x)→ Q,

. . . and all ∃’s are Skolemized.

Conditional Tableau Rules for ANF

S ∪ {ϕ[C], ψ[C]}(ϕ ∧ ψ)[C] ∈ S

(conj)S ∪ {ϕ[C]} S ∪ {ψ[C]}

(ϕ ∨ ψ)[C] ∈ S(disj)

S ∪ {(ϕ[t/x ])[C ∪ D]}{R(t)[C], (∀x .R(x)→ ϕ)[D]} ⊆ S

(abs)S ∪ {R(t)[R(t)]}

SR(x) ∈ SA (phys)

David Toman (et al.) KR in DBMS Implementation How does it work? 30 / 40

Page 56: KR in Database Systems Implementationdavid/cs348/interpolation.pdf)the instance is a model of the constraints What about CREATE VIEW Statements? View declaration ˘a sentence 8x:V(x)

Conditional Tableau: Outline

Goal: a closed tableau of the form

T P

T L T R T L . . . T R

T L and T R are logical left- and right-tableau(only logical constraints ΣL and ΣR , built 1st)

T P is a physical tableau (only physical rules ΣLR , built 2nd)

David Toman (et al.) KR in DBMS Implementation How does it work? 31 / 40

Page 57: KR in Database Systems Implementationdavid/cs348/interpolation.pdf)the instance is a model of the constraints What about CREATE VIEW Statements? View declaration ˘a sentence 8x:V(x)

Conditional Tableau: Logical Parts

Conditional Tableau for (Q,Σ,SA)

Proof trees (T L,T R): T L for ΣL ∪ {QL(a)} over {PL | P ∈ SA}T R for ΣR ∪ {QR(a)→ ⊥} over {PR | P ∈ SA}

Closing Set(s)We call S of atoms over SA a closing set for T if, for every branch,

1 there is an atom R(t)[C], R(t) 6∈ C, such that C ∪ {¬R(t)} ⊆ S, or2 there is ⊥[C] such that C ⊆ S.

⇒ there are many different minimal closing sets for T .

ObservationFor arbitrary closing set S, an interpolant for T L(T R) is ⊥(>).

David Toman (et al.) KR in DBMS Implementation How does it work? 32 / 40

Page 58: KR in Database Systems Implementationdavid/cs348/interpolation.pdf)the instance is a model of the constraints What about CREATE VIEW Statements? View declaration ˘a sentence 8x:V(x)

Plan Enumeration

Physical Tableau T P

P : LP RP

:R(t) : {{¬RL(t)}} {{RR(t)}}P1 ∧ P2 : LP1 ∪ LP2 {S1 ∪ S2 | S1 ∈ RP1 ,S2 ∈ RP2}P1 ∨ P2 : {S1 ∪ S2 | S1 ∈ LP1 ,S2 ∈ LP2} RP1 ∪ RP2

¬P1 : {{LL(t) | LR (t) ∈ S} | S ∈ RP1} {{LR (t) | LL(t) ∈ S} | S ∈ LP1}∃x .P1 : LP1[t/x ] RP1[t/x ]

ObservationFor a range-restricted formula P over SA there is an analytic tableau T P thatuses only formulæ in ΣLR ∪ {∀x .trueR(x)} such that:

1 Open branches of T P correspond to sets S ∈ LP (left branch) or S ∈ RP(right branch); and

2 The interpolant extracted from this tableau is logically equivalent to P,assuming that the closure of each left branch interpolates to ⊥ and forright branch to >.

David Toman (et al.) KR in DBMS Implementation How does it work? 33 / 40

Page 59: KR in Database Systems Implementationdavid/cs348/interpolation.pdf)the instance is a model of the constraints What about CREATE VIEW Statements? View declaration ˘a sentence 8x:V(x)

Logical&Physical Combined, Controlling the Search

Basic Strategy1 build (T L,T R) for (Q,Σ,SA) to a certain depth,2 build T P and test if each element in LP(RP) closes T L(T R).

if so, T P [T L,T R] is closed tableau yielding an interpolant equivalent to P;(. . . otherwise extend depth in step 1 and repeat.)

In step 2 we can “test” many Ps (plan enumeration), buthow do we know which ones to try? while building these bottom-up?

Controlling the Searchonly use the (phys) rule in T L(T R) for R(t) that appears in T R(T L),only consider fragments that help closing (T L,T R)

⇒ this is determined using the minimal closing sets for (T L,T R).

. . . combine with search (among P ’s) with respect to a cost model.

David Toman (et al.) KR in DBMS Implementation How does it work? 34 / 40

Page 60: KR in Database Systems Implementationdavid/cs348/interpolation.pdf)the instance is a model of the constraints What about CREATE VIEW Statements? View declaration ˘a sentence 8x:V(x)

Logical&Physical Combined, Controlling the Search

Basic Strategy1 build (T L,T R) for (Q,Σ,SA) to a certain depth,2 build T P and test if each element in LP(RP) closes T L(T R).

if so, T P [T L,T R] is closed tableau yielding an interpolant equivalent to P;(. . . otherwise extend depth in step 1 and repeat.)

In step 2 we can “test” many Ps (plan enumeration), buthow do we know which ones to try? while building these bottom-up?

Controlling the Searchonly use the (phys) rule in T L(T R) for R(t) that appears in T R(T L),only consider fragments that help closing (T L,T R)

⇒ this is determined using the minimal closing sets for (T L,T R).

. . . combine with search (among P ’s) with respect to a cost model.

David Toman (et al.) KR in DBMS Implementation How does it work? 34 / 40

Page 61: KR in Database Systems Implementationdavid/cs348/interpolation.pdf)the instance is a model of the constraints What about CREATE VIEW Statements? View declaration ˘a sentence 8x:V(x)

Postprocessing

Duplicate Elimination Elimination

Q[{R(x1, . . . , xk )}]↔Q[R(x1, . . . , xk )]Q[{Q1 ∧Q2}]↔Q[{Q1} ∧ {Q2}]

Q[{¬Q1}]↔Q[¬Q1]Q[¬{Q1}]↔Q[¬Q1]

Q[{Q1 ∨Q2}]↔Q[{Q1} ∨ {Q2}] if Σ ∪ {Q[]} |= Q1 ∧Q2 → ⊥Q[{∃x .Q1}]↔Q[∃x .{Q1}]

if Σ ∪ {Q[] ∧Q1[y1/x ] ∧Q1[y2/x} |= y1 ≈ y2

CFDnc logic to approximate Σ and to reason about FDs/disjointness in PTIME.

David Toman (et al.) KR in DBMS Implementation How does it work? 35 / 40

Page 62: KR in Database Systems Implementationdavid/cs348/interpolation.pdf)the instance is a model of the constraints What about CREATE VIEW Statements? View declaration ˘a sentence 8x:V(x)

Postprocessing

Duplicate Elimination Elimination

Q[{R(x1, . . . , xk )}]↔Q[R(x1, . . . , xk )]Q[{Q1 ∧Q2}]↔Q[{Q1} ∧ {Q2}]

Q[{¬Q1}]↔Q[¬Q1]Q[¬{Q1}]↔Q[¬Q1]

Q[{Q1 ∨Q2}]↔Q[{Q1} ∨ {Q2}] if Σ ∪ {Q[]} |= Q1 ∧Q2 → ⊥Q[{∃x .Q1}]↔Q[∃x .{Q1}]

if Σ ∪ {Q[] ∧Q1[y1/x ] ∧Q1[y2/x} |= y1 ≈ y2

CFDnc logic to approximate Σ and to reason about FDs/disjointness in PTIME.

⇒ also CUT insertion (similar—seeMorgan Claypool Publishers&w w w . m o r g a n c l a y p o o l . c o m

Series Editor: M. Tamer Özsu, University of Waterloo

CM& Morgan Claypool Publishers&SYNTHESIS LECTURES ON DATA MANAGEMENT

SYNTHESIS LECTURES ON DATA MANAGEMENT

About SYNTHESIsThis volume is a printed version of a work that appears in the SynthesisDigital Library of Engineering and Computer Science. Synthesis Lecturesprovide concise, original presentations of important research and developmenttopics, published quickly, in digital and print formats. For more informationvisit www.morganclaypool.com

M. Tamer Özsu, Series Editor

MORGAN

&CLAYPO

OL

ISBN: 978-1-60845-278-1

9 781608 452781

90000

Series ISSN: 2153-5418

FUNDAMENTALS OF PHYSICAL DESIGN AND Q

UERY COMPILATION

Fundamentals of Physical Design andQuery Compilation

University of Waterloo

Query compilation is the problem of translating user requests formulated over purely conceptual anddomain specific ways of understanding data, commonly called logical designs, to efficient executableprograms called query plans. Such plans access various concrete data sources through their low-leveloften iterator-based interfaces. An appreciation of the concrete data sources, their interfaces and howsuch capabilities relate to logical design is commonly called a physical design. This book is an introductionto the fundamental methods underlying database technology that solves the problem of querycompilation. The methods are presented in terms of first-order logic which serves as the vehicle forspecifying physical design, expressing user requests and query plans, and understanding how queryplans implement user requests.

Fundamentals ofPhysical Design andQuery Compilation

David Toman

Morgan Claypool Publishers&w w w . m o r g a n c l a y p o o l . c o m

Series Editor: M. Tamer Özsu, University of Waterloo

CM& Morgan Claypool Publishers&SYNTHESIS LECTURES ON DATA MANAGEMENT

SYNTHESIS LECTURES ON DATA MANAGEMENT

About SYNTHESIsThis volume is a printed version of a work that appears in the SynthesisDigital Library of Engineering and Computer Science. Synthesis Lecturesprovide concise, original presentations of important research and developmenttopics, published quickly, in digital and print formats. For more informationvisit www.morganclaypool.com

M. Tamer Özsu, Series Editor

MORGAN

&CLAYPO

OL

ISBN: 978-1-60845-278-1

9 781608 452781

90000

Series ISSN: 2153-5418

FUNDAMENTALS OF PHYSICAL DESIGN AND Q

UERY COMPILATION

Fundamentals of Physical Design andQuery Compilation

University of Waterloo

Query compilation is the problem of translating user requests formulated over purely conceptual anddomain specific ways of understanding data, commonly called logical designs, to efficient executableprograms called query plans. Such plans access various concrete data sources through their low-leveloften iterator-based interfaces. An appreciation of the concrete data sources, their interfaces and howsuch capabilities relate to logical design is commonly called a physical design. This book is an introductionto the fundamental methods underlying database technology that solves the problem of querycompilation. The methods are presented in terms of first-order logic which serves as the vehicle forspecifying physical design, expressing user requests and query plans, and understanding how queryplans implement user requests.

Fundamentals ofPhysical Design andQuery Compilation

David Toman

Morgan Claypool Publishers&w w w . m o r g a n c l a y p o o l . c o m

Series Editor: M. Tamer Özsu, University of Waterloo

CM& Morgan Claypool Publishers&SYNTHESIS LECTURES ON DATA MANAGEMENT

SYNTHESIS LECTURES ON DATA MANAGEMENT

About SYNTHESIsThis volume is a printed version of a work that appears in the SynthesisDigital Library of Engineering and Computer Science. Synthesis Lecturesprovide concise, original presentations of important research and developmenttopics, published quickly, in digital and print formats. For more informationvisit www.morganclaypool.com

M. Tamer Özsu, Series Editor

MORGAN

&CLAYPO

OL

ISBN: 978-1-60845-278-1

9 781608 452781

90000

Series ISSN: 2153-5418

FUNDAMENTALS OF PHYSICAL DESIGN AND Q

UERY COMPILATION

Fundamentals of Physical Design andQuery Compilation

University of Waterloo

Query compilation is the problem of translating user requests formulated over purely conceptual anddomain specific ways of understanding data, commonly called logical designs, to efficient executableprograms called query plans. Such plans access various concrete data sources through their low-leveloften iterator-based interfaces. An appreciation of the concrete data sources, their interfaces and howsuch capabilities relate to logical design is commonly called a physical design. This book is an introductionto the fundamental methods underlying database technology that solves the problem of querycompilation. The methods are presented in terms of first-order logic which serves as the vehicle forspecifying physical design, expressing user requests and query plans, and understanding how queryplans implement user requests.

Fundamentals ofPhysical Design andQuery Compilation

David Toman

).

David Toman (et al.) KR in DBMS Implementation How does it work? 35 / 40

Page 63: KR in Database Systems Implementationdavid/cs348/interpolation.pdf)the instance is a model of the constraints What about CREATE VIEW Statements? View declaration ˘a sentence 8x:V(x)

Summary of the Approach

1 FO (DLFDE) tableau based interpolation algorithm⇒ enumeration of plans factored from reasoning⇒ range-restricted queries and constraints→ ground terms only⇒ extra-logical binding patterns and cost model

2 post processing (using CFD approximation)⇒ duplicate elimination elimination⇒ cut insertion

3 run time⇒ library of common data structures+schema constraints

or an interface to a legacy system⇒ finger data structures to simulate merge joins et al.

David Toman (et al.) KR in DBMS Implementation Summary 36 / 40

Page 64: KR in Database Systems Implementationdavid/cs348/interpolation.pdf)the instance is a model of the constraints What about CREATE VIEW Statements? View declaration ˘a sentence 8x:V(x)

Brain Teaser

david$ cat nash.fol% Nash unlabelled[% constraintsp0v1(x,y)<->ex([z,u],r(z,x) and r(z,u) and r(u,y)),p0v2(x,y)<->ex(z,r(x,z) and r(z,y)),p0v3(x,y)<->ex([z,u],r(x,z) and r(z,u) and r(u,y))% queryq(x,y)<->ex([z,u,t],r(z,x) and r(z,u) and r(u,t) and r(t,y)),].

David Toman (et al.) KR in DBMS Implementation Summary 37 / 40

Page 65: KR in Database Systems Implementationdavid/cs348/interpolation.pdf)the instance is a model of the constraints What about CREATE VIEW Statements? View declaration ˘a sentence 8x:V(x)

Brain Teaser

david$ cat nash.fol | ./trans | ./main -

Closing sets:L: { -p0v1(sl0:1,sl21:5)/0 }

{ -p0v3(sl17:3,sl0:2)/0 }{ -p0v1(sl0:1,sr1a:1f)/0 +p0v2(sl17:3,sr1a:1f)/0 }{ -p0v3(sr10:d,sl0:2)/0 +p0v2(sr10:d,sl21:5)/0 }

R: { -p0v2(sr10:d,sl21:5)/0 +p0v1(sl0:1,sl21:5)/0 }{ -p0v2(sl17:3,sr1a:1f)/0 +p0v3(sl17:3,sl0:2)/0 }{ +p0v3(sr10:d,sl0:2)/0 +p0v1(sl0:1,sl21:5)/0 }{ +p0v3(sl17:3,sl0:2)/0 +p0v1(sl0:1,sr1a:1f)/0 }

Success with fragment:+EXISTS[sl21:5.E](+JOIN(+ATOM p0v1(sl0:1,sl21:5),-EXISTS[sr10:d.A](+JOIN(

+ATOM p0v2(sr10:d,sl21:5),-ATOM p0v3(sr10:d,sl0:2)))))

David Toman (et al.) KR in DBMS Implementation Summary 38 / 40

Page 66: KR in Database Systems Implementationdavid/cs348/interpolation.pdf)the instance is a model of the constraints What about CREATE VIEW Statements? View declaration ˘a sentence 8x:V(x)

Brain Teaser

david$ cat nash.fol | ./trans | ./main -

Closing sets:L: { -p0v1(sl0:1,sl21:5)/0 }

{ -p0v3(sl17:3,sl0:2)/0 }{ -p0v1(sl0:1,sr1a:1f)/0 +p0v2(sl17:3,sr1a:1f)/0 }{ -p0v3(sr10:d,sl0:2)/0 +p0v2(sr10:d,sl21:5)/0 }

R: { -p0v2(sr10:d,sl21:5)/0 +p0v1(sl0:1,sl21:5)/0 }{ -p0v2(sl17:3,sr1a:1f)/0 +p0v3(sl17:3,sl0:2)/0 }{ +p0v3(sr10:d,sl0:2)/0 +p0v1(sl0:1,sl21:5)/0 }{ +p0v3(sl17:3,sl0:2)/0 +p0v1(sl0:1,sr1a:1f)/0 }

Success with fragment:+EXISTS[sl17:3.E](+JOIN(+ATOM p0v3(sl17:3,sl0:2),-EXISTS[sr1a:1f.A](+JOIN(

+ATOM p0v2(sl17:3,sr1a:1f),-ATOM p0v1(sl0:1,sr1a:1f)))))

David Toman (et al.) KR in DBMS Implementation Summary 38 / 40

Page 67: KR in Database Systems Implementationdavid/cs348/interpolation.pdf)the instance is a model of the constraints What about CREATE VIEW Statements? View declaration ˘a sentence 8x:V(x)

Open Issues

1 Dealing with ordered data? (merge-joins etc.: we have a partial solution)

2 Decidable schema languages (decidable interpolation problem)?

3 More powerful schema languages (inductive types, etc.)?

4 Beyond FO Queries/Views (e.g., count/sum aggregates)?

5 Coding extra-logical bits (e.g., binding patterns, postprocessing, etc. )in the schema itself?

6 Standard Designs (a plan can always be found as in SQL)?

7 Explanation(s) of non-definability?

8 Fine(r)-grained updates?

9 . . .

. . . and, as always, performance, performance, performance!

David Toman (et al.) KR in DBMS Implementation Summary 39 / 40

Page 68: KR in Database Systems Implementationdavid/cs348/interpolation.pdf)the instance is a model of the constraints What about CREATE VIEW Statements? View declaration ˘a sentence 8x:V(x)

Message from our Sponsors

Data Systems Group at the University of Waterloo∼ 10 professors, affiliated faculty, postdocs, 40+ graduate students, . . .wide range of research interests

Advanced query processing/Knowledge representationSystem aspects of database systems and Distributed data managementData quality/Managing uncertain data/Data miningNew(-ish) domains (text, streaming, graph data/RDF, OLAP)

research sponsored by governments, and local/global companiesNSERC/CFI/OIT and Google, IBM, SAP, OpenText, . . .

. . . and we are always looking for good graduate students (MMath/PhD)⇒ comes with full support over multiple years

David Toman (et al.) KR in DBMS Implementation Summary 40 / 40