Realization 11. Table Operations – Implementation · Realization of DBS 11. Table Operations –...

Realizationof DBS

11. Table Operations – Implementation

Theo Härderwww.haerder.de

Goals- Systematic development of relational processing concepts

for a single table or for several tables- Realization of plan operators

© 2011 AG DBIS

Realization of Database Systems – SS 2011

Main reference:Theo Härder, Erhard Rahm: Datenbanksysteme – Konzepte und Techniken der Implementierung, Springer, 2001, Chapter 11.

Goetz Graefe: Query Evaluation Techniques for Large Databases, ACM Computing Surveys 25:2, June 1993, pp. 73-170.

Realizationof DBS

Plan operators

Table operations

Table Operations - Implementation

Operations of the relational algebra - Unary operations:- Binary operations: , , , , , –

,

Sort

TR

Joins on type-spanning paths

Nested-loops & sort/merge join

Hash join

Distributed joins

SQL queries contain logical expressions which can be mapped to the operations of the relational algebra. They are further transformed into access plans. So-called plan operators implement these logical operations

Plan operators on a single table Selection

Operators across several tables

S

© 2011 AG DBIS

Set operations

11-2

Join algorithms- Nested-loops join, Sort-merge join- Hash join (classic hashing, simple hash join, hybrid hash join)- Exploitation of type-crossing access paths- Distributed join algorithms

Further binary operations (set operations)

Realizationof DBS

Plan operators

Table operations

Plan Operators on a Single Table

Selection – general ways of evaluation• Direct access via a given TID, via a hash method or a one- resp. multi-

dimensional index structure• Sequential search in a table• Search via an index structure (index table, bitlist)



Hash join

Distributed joins

• Selection using several pointer lists where more than a single index structure can be exploited

• Search via a multi-dimensional index structure

Projectionis typically performed in combination with sorting, selection, or join

Modification

© 2011 AG DBIS

Set operations

11-3

• Updates are set-oriented in SQL, but restricted to a single table• INSERT, DELETE and UPDATE are directly mapped to the corresponding

operations of the storage structures • “Automatic” execution of maintenance operations

- to update access paths, - to guarantee clustering and reorganization etc.

• Provisions for logging and recovery etc.

Realizationof DBS

Plan operators

Table operations

Plan Operators for the Selection

Use of Scan Operators• Definition of start- and stop condition• Definition of simple search arguments

Plan operators1. Table scan (relation scan)



Hash join

Distributed joins

- Always possible- SCAN operator implements selection operation

2. Index scan- Selection of most cost-effective index- Specification of search range (start-, stop condition)

3. k-d scan- Evaluation of multi-dimensional search criteria- Use of differing evaluation directions by navigation

4. TID algorithm- Evaluation of all “useable" index structures

© 2011 AG DBIS

Set operations

11-4

Evaluation of all useable index structures- Location of TID lists of variable lengths - Boolean connection of the lists- Access to the records according to the hit list (result list)

Further plan operators in combination with selection• Sorting• Grouping (see sort operator)• Special operators e.g. in Data-Warehouse applications for grouping and

aggregation (CUBE operator)

Realizationof DBS

Plan operators

Table operations

Operators Across Several Tables

SQL allows complex queries across k tables• One-variable expressions:

describe conditions for the selection of elements from a table• Two-variable expressions:

describe conditions for the combination of elements from two tables• Typically, k-variable expressions are decomposed into one- and two-variable

i d l t d b di l t



Hash join

Distributed joins

expressions and evaluated by corresponding plan operators

Plan operators across several tables• General ways for the evaluation:- Nested iteration

for each element of outer table Totraversal of inner table Ti

• O(No · Ni + No)• important application: nested-loops join

Merge method

© 2011 AG DBIS

Set operations

11-5

- Merge methoditerating traversals through T1, T2

• O(N1 + N2)• additional sort costs, if necessary• important application: merging join

- HashingPartitioning of inner table Ti and partition-wise loadingin HT in memory. “Probing” by outer table To or itsresp. partitions using HT: O(p · No + Ni)

Realizationof DBS

Plan operators

Table operations

Operators Across Several Tables (2)

n-way joins• Decomposition into n-1 two-way joins2

• Number of possible join sequences is dependent on the join attributes chosen • Maximal n! different sequences possible• Use of pipelining techniques

O i l l i d d



Hash join

Distributed joins

• Optimal evaluation sequence dependent on - Plan operators - “Fitting” sort orders for join attributes - Size of operands etc.

Some join sequences using two-way joins (n=5)result result

result

© 2011 AG DBIS

Set operations

11-62. Practicality test (Guy Lohman test for join techniques): Does a new technique apply to joining three inputs without interrupting data flow between the join operators?

Analogous proceeding in case of set operations

T5

T4

T3

T1 T2

left-deep tree

T2

T4

T5

T1T3

right-deep treeT1 T2 T3 T4

T5

bushy tree

Realizationof DBS

Plan operators

Table operations

Plan Operators for the Join Join

• Record-type-spanning operation: usually very expensive• Frequent use: important optimization candidate• Typical application: equi-join• General Θ-join infrequent

Implementation of the join operation



Hash join

Distributed joins

p j pcan process, at the same time, selections (and projections) on the participating tables R and S

SELECT *FROM R, SWHERE R.JA Θ S.JA

AND PRAND PS

• JA: join attribute• PR and PS: predicates defined on selection attributes (SA) of R and S

Possible access paths

© 2011 AG DBIS

Set operations

11-7

p• Scans over R and S (always)

• Scans over IR(JA), IS(JA) (if present) deliver sort sequence according to JA

• Scans over IR(SA), IS(SA) (if present) if necessary, fast selection for PR and PS

• Scans over other index structures (if present) if necessary, faster location of all records

Realizationof DBS

Plan operators

Table operations

Nested-Loops Join Assumptions

• Records in R and S are not ordered according to join attributes • Index structures IR(JA) and IS(JA) do not exist

Algorithm for Θ-joinScan over S, for each record s, if PS:

R



Hash join

Distributed joins

scan over R,for each record r, if PR AND (r.JA Θ s.JA):

execute join, i.e., write combined record (r, s) into the result set.

Complexity: O(N*M) Nested-loops join using index access

Scan over S, for each record s, if PS:

determine via access to IR(JA) all TIDs for records satisfying r.JA = s.JA, for each TID:

fetch record r, if PR: b d d ( ) h l

© 2011 AG DBIS

Set operations

11-8

Rwrite combined record (r, s ) into the result set.

Nested-block joinScan over S, for each page (resp. set of contiguous pages) of S:

scan over R,for each page (resp. set of contiguous pages) of R:

for each record s of the S-page, if PS:for each record r of the R-page,

if PR AND (r.JA Θ s.JA): write combined record (r, s) into the result set.

Realizationof DBS

Plan operators

Table operations

Sort-Merge Join Algorithm consists of 2 phases

• Phase 1: Sorting of R and S w.r.t R(JA) and S(JA) (if not already present);in doing so, early elimination of records not needed ( PR, PS)

• Phase 2: Iterating scans over sorted R- and S-recordswhere join is performed in case of r.JA = s.JA

Complexity: O(N log N)



Hash join

Distributed joins

Complexity: O(N log N) Special case

If either IR(JA) and IS(JA) or GAPS over R(JA) and S(JA) (join index) is present:

exploitation of index structures on join attributesIterating scans over IR(JA) and IS(JA):

for each with two keys from IR(JA) and IS(JA), if r.JA = s.JA:fetch the records using the related TIDs,

if PR and PS: write combined record (r s) into the result set

© 2011 AG DBIS

Set operations

11-9

write combined record (r, s) into the result set

Realizationof DBS

Plan operators

Table operations

Hash Join

Simplest case (classic hashing)• Step 1: Partitioned read of (smaller) table R and construction of a hash

table using hH(r(JA)) w.r.t. values of R(JA) of partitions Ri (1 i p):each partition fits into the available memory and each record satisfies PR

• Step 2: Probing for records of S using PS; if successful, execution of join



Hash join

Distributed joins

S

• Step 3: Repeat steps 1 and 2 as long as R is exhausted

Construction of hash tables and probingScan over R; building hash tables Hi (1 i p) one at a time in memory

R

SScan over S with probing of

H1

H1

© 2011 AG DBIS

Set operations

11-10

Complexity: O(p · N) Special case

R fits into memory: one partition (p = 1) a single scan over S is sufficient!

R

S Scan over S with probing of HP

. . .Hp

Realizationof DBS

Plan operators

Table operations

Hash Join (2)

#records /JA-value

Partitioning of R with hp(r(JA))



Hash join

Distributed joins

JA1000

JA’1

#records /

0.660.33

hp(r(JA))JA’-value

0

© 2011 AG DBIS

Set operations

11-11

R1 R2 R3

Realizationof DBS

Plan operators

Table operations

Hash Join (3)

Partitioning• Partitioning of R in subsets R1, R2, ..., Rp:

a record r of R is in Ri, if h(r) is in Hi

R



Hash join

Distributed joins

. . .

H1 H2 Hp

Why is this partitioning a critical operation?

© 2011 AG DBIS

Set operations

11-12

Which auxiliary operations may be required?

Is the use of a hash function needed for partitioning?

• Table S is partitioned with same function hP while evaluating PS

Realizationof DBS

Plan operators

Table operations

Hash Join (4) Variants of hash join are primarily distinguished by the kind of partitioning Partitioning technique in case of simple hash join

shown for construction and probing of H1

R

H

step 1:



Hash join

Distributed joins Simple hash join• Step 1: Execute scan on R (smaller table), evaluate PR and apply hash function hP to

H1

S

Rrest

Srest

step 2:

1. iteration

© 2011 AG DBIS

Set operations

11-13

p ( ), R pp y Peach qualified record r. Is hP(r(JA)) in the chosen range, insert record into H1. Otherwise, write r in an output buffer for a file Rrest for “pretermitted” r-records.

• Step 2: Execute scan on S, evaluate PS and apply hash function hP to each qualified record s. Is hP(s(JA)) in the chosen range, search a join counterpart (probing) in H1. If successful, form a join record and put it to the result. Otherwise, write s to an output buffer for a file Srest for “pretermitted” s-records.

• Step 3: Repeat step 1 and 2 using the so far “pretermitted” records on Hi as long as Rrest is exhausted. Here, evaluation of PR and PS is not required anymore.

Realizationof DBS

Plan operators

Table operations

Hash Join (5)

Grace join (grace join)• Partitioning of R and S takes place before join starts• Partitions Ri and Si are stored in temporary files on disk• Construction of Hi (having M pages) in memory with Ri and probing with Si



Hash join

Distributed joins

R1

S1Scan over S1 with probing of H1

H1

RP

SP

. . .

HP

© 2011 AG DBIS

Set operations

11-14

SPScan over SP with probing of HP

What is the minimal memory size required?

Realizationof DBS

Plan operators

Table operations

Hash Join (6)

Hybrid hash join• Optimization such that construction and probing of H1 is done

in parallel to partitioning

RScan

1)



Hash join

Distributed joins

R1 constructed in H1R2 R3 RP

a)memory

memory area:1 page each

S2 S3 SPimmediateprobing ofS1-records

b) S

© 2011 AG DBIS

Set operations

11-15

. . .

2) H2R2

S2

Scan

3)

as in case of Grace join

Scan

Realizationof DBS

Plan operators

Table operations

Hash Join - Example

Partitioninga) Partitioning of R with hP(r(JA))

J A1 0 00

# reco rd s /

# reco rd s /h p ( r (JA ))

J A -v a lu e

J A ‘-v a lu e



Hash join

Distributed joins

J A ‘10 .6 60 .3 3

R 1 R 2 R 3

0

b) Partitioning of S with hP(s(JA))

II. Join R1

S1 JA’: 0.0 – 0.33read, probing with of hH(s(JA))

H1

JA’: 0.0 – 0.33in memory with hH(r(JA))

1)

© 2011 AG DBIS

Set operations

11-16

R2

S2JA’: 0.34 – 0.66

H2

JA’: 0.34 – 0.66

2)

R3

S3JA’: 0.67 – 1.0

H3

JA’: 0.67 – 1.03)

Realizationof DBS

Plan operators

Table operations

Use of Type-Spanning Access Paths Join via link structures

• Use of hierarchical access paths for equi-joinScan over R (Owner table), for each record r, if PR:

Scan over related link structure LR-S(JA), for each record s if PS:



Hash join

Distributed joins

for each record s, if PS: write combined record (r, s ) into the result set.

Further methods• Join indexes which are built for certain Θ-joins

R S

TID TID

R S

TID TID

VIR:

R S

TID TID

VIS:

© 2011 AG DBIS

Set operations

11-17

TIDr2

TIDr1

TIDr2

TIDr2

TIDs4

TIDs3

TIDs2

TIDs6

TIDr1

TIDr2

TIDr2

TIDr2

TIDs3

TIDs2

TIDs4

TIDs6

Index for TIDRLogical view

TIDs2

TIDs3

TIDs4

TIDs6

TIDr2

TIDr1

TIDr2

TIDr2

Index for TIDS

Realizationof DBS

Plan operators

Table operations

Use of Type-Spanning Access Paths (2)

• Use of generalized access path structures (GAPS)

K53



Hash join

Distributed joins

K25 K36 K47 K58 K78 K88

. . .. . . . . .

TIDs for Dept TIDs forMgr

© 2011 AG DBIS

Set operations

11-18

K55 1 TID. . . 3 1 4 TID TID TID TID TID TID TID TID . . .

PRIOR NEXT TIDs for Emp TIDs forEquipment

optionalreference to

overflow page

Realizationof DBS

Plan operators

Table operations

Join Algorithms - Comparison

e21 e22 e23

e11

e12

e13...

input stream 2

t st

ream

1

e21 e22 e23

e11

e12

e13...

e21 e22 e23

e11

e12

e13...



Hash join

Distributed joins

Nested-loops join is always applicable, however, scanning of complete search space has to be taken into account.

Merge join needs lowest search costs, requires, however, sorted input streams. Index

inpu

t

(a) Nested-loops join (b) Merge join (c) Hash joinHash partitions

element comparison successful element comparison

© 2011 AG DBIS

Set operations

11-19

Merge join needs lowest search costs, requires, however, sorted input streams. Index structures on both join attributes satisfy this prerequisite. Otherwise, explicit sorting of both tables w.r.t. join attributes reduces cost advantage substantially. Nevertheless, sort-merge join can own additional advantages, if the result is required in sorted sequence and sorting of the large result is more expensive than sorting of two small result sets.

Hash join partitions search space. Fig. c assumes that the same hash function h is applied to tables R and S. The partition size of the (smaller) table is given by the available buffer size in memory. A reduction of the partition size, to approximate case b, causes higher preparation costs and is therefore not recommendable.

Realizationof DBS

Plan operators

Table operations

Join Algorithms in Distributed DBS Problem statement

• Query in node K, which requires a join between (sub-)table R at node KR and (sub-)table S at node KS

• Determination of processing node: K, KR or KS

Determination of evaluation strategy



Hash join

Distributed joins

• Send participating tables completely to a node and compute join locally (“ship whole”)- Minimal number of messages

- Very high transfer volumes

• Request for every join value in the first table related records from the second table (“fetch as needed“)

- Large number of messages

- Only relevant records are considered

• Trade-off solution: Semi-join resp. extensions such as Bit-vector join (hash filter join)

© 2011 AG DBIS

Set operations

11-20

Semi-join • Shipping of a list of JA values of R to node of S• Determination of join counterparts in S and returning them to node of R• Then join processing at node of R

Bit-vector join• Similar to Semi-join, only shipping of a bit vector (Bloom Filter) created using a hash

function • Returning a superset of join counterparts in S

Realizationof DBS

Plan operators

Table operations

Semi-Join and Bit-Vector Join

473964

MgrLocDnoHansAnna

4747

NameDno

PhoneAddressNameDnoD

FrankfurtDept

EmpMunich

return projections ofjoin counterpart records

ship the whole JA column

join



Hash join

Distributed joins

44

HansAnna

4747

692875

473964

Dno

find join counterparts

473964

MgrLocDNo PhoneAddress47

NameDnoFrankfurtDept

return the potential

+ joincheck

© 2011 AG DBIS

Set operations

11-219144

47

692875

PhoneAddressNameDnoEmp

return the potentialjoin candidates

00011001

00011001

create bit vector by hashing

ship bit vectorMunich

hashing of Dno values to find potential join candidates

Realizationof DBS

Plan operators

Table operations

Set Operations3

Which set operations are needed?

A B C

R S

R, S union-compatible input streamsA, B, C element sets



Hash join

Distributed joins

operation result matching in all attributes matching in one or several attributes

A difference (R-S) anti-semi-join (S, R)

B intersection join, semi-join (S, R)

C difference (S-R) anti-semi-join (R, S)

A, B left-sided outer join

© 2011 AG DBIS

Set operations

11-22

A, C anti-difference anti-join

B, C right-sided outer join

A, B, C union symmetrical outer join

Which algorithms can be used for these set operations?• What has to be compared at a time?• How can a relationship to the join algorithms be found?

3. Graefe, G.: Query evaluation techniques for large databases, ACM Computing Surveys 25:2, 1993, pp. 73-170

Realizationof DBS

Plan operators

Table operations

Set Operations (2) Binary matching operations

• Solve the same task, in principle: “one-to-one matching operations”• An input element contributes to the output dependent of its “match” with another input

element • Operations repeatedly require the same steps and, therefore, can be implemented using

the same algorithms• Set- and join operations are closely connected!



Hash join

Distributed joins

j p y

Same logical proceeding• Three element sets are formed from R and S: A, B, C• Elements in B fit together !• How can these three element sets be formed?

- Using nested iteration- Using merge method - Using hash method

Unified realization concept• Comparison of join- vs primary-key attributes

© 2011 AG DBIS

Set operations

11-23

• Comparison of join- vs. primary-key attributes• Commonality: records are grouped on the basis of attribute values• Some unary operations are possible with special measures

- Grouping and sorting enable simple duplicate elimination- In case of aggregation, an attribute value per group is determined- In case of join, grouping of potential join counterparts is cost-effective

(either in partitions or a sort order) - Using set operations, the element sets A, B, C can be found;

at the same time, duplicate elimination is possible

Realizationof DBS

Plan operators

Table operations

Summary Selection operations

• Existing access path types require tailor-made operations and efficient mapping• Combination of various access paths possible (TID algorithm)

General classes of evaluation methods for binary operations• Nested iteration



Hash join

Distributed joins

• Merge method• Hashing

Many options for processing of join operations• Nested-loops join • Sort-merge join • Hash join • And variations

Set operations

© 2011 AG DBIS

Set operations

11-24

Set operations • Use of the same algorithm classes, in principle• Variation of executing comparisons

Extensibility infrastructure in object-relational DBMS• Creation of user-defined functions and operators• Generalization: user-defined table operators with n input tables and m output tables

Realization 11. Table Operations – Implementation · Realization of DBS 11. Table Operations –...

Documents

Transcript of Realization 11. Table Operations – Implementation · Realization of DBS 11. Table Operations –...