Realization 11. Table Operations – Implementation · Realization of DBS 11. Table Operations –...
Transcript of Realization 11. Table Operations – Implementation · Realization of DBS 11. Table Operations –...
Realizationof DBS
11. Table Operations – Implementation
Theo Härderwww.haerder.de
Goals- Systematic development of relational processing concepts
for a single table or for several tables- Realization of plan operators
© 2011 AG DBIS
Realization of Database Systems – SS 2011
Main reference:Theo Härder, Erhard Rahm: Datenbanksysteme – Konzepte und Techniken der Implementierung, Springer, 2001, Chapter 11.
Goetz Graefe: Query Evaluation Techniques for Large Databases, ACM Computing Surveys 25:2, June 1993, pp. 73-170.
Realizationof DBS
Plan operators
Table operations
Table Operations - Implementation
Operations of the relational algebra - Unary operations:- Binary operations: , , , , , –
,
Sort
TR
Joins on type-spanning paths
Nested-loops & sort/merge join
Hash join
Distributed joins
SQL queries contain logical expressions which can be mapped to the operations of the relational algebra. They are further transformed into access plans. So-called plan operators implement these logical operations
Plan operators on a single table Selection
Operators across several tables
S
© 2011 AG DBIS
Set operations
11-2
Join algorithms- Nested-loops join, Sort-merge join- Hash join (classic hashing, simple hash join, hybrid hash join)- Exploitation of type-crossing access paths- Distributed join algorithms
Further binary operations (set operations)
Realizationof DBS
Plan operators
Table operations
Plan Operators on a Single Table
Selection – general ways of evaluation• Direct access via a given TID, via a hash method or a one- resp. multi-
dimensional index structure• Sequential search in a table• Search via an index structure (index table, bitlist)
Joins on type-spanning paths
Nested-loops & sort/merge join
Hash join
Distributed joins
• Selection using several pointer lists where more than a single index structure can be exploited
• Search via a multi-dimensional index structure
Projectionis typically performed in combination with sorting, selection, or join
Modification
© 2011 AG DBIS
Set operations
11-3
• Updates are set-oriented in SQL, but restricted to a single table• INSERT, DELETE and UPDATE are directly mapped to the corresponding
operations of the storage structures • “Automatic” execution of maintenance operations
- to update access paths, - to guarantee clustering and reorganization etc.
• Provisions for logging and recovery etc.
Realizationof DBS
Plan operators
Table operations
Plan Operators for the Selection
Use of Scan Operators• Definition of start- and stop condition• Definition of simple search arguments
Plan operators1. Table scan (relation scan)
Joins on type-spanning paths
Nested-loops & sort/merge join
Hash join
Distributed joins
- Always possible- SCAN operator implements selection operation
2. Index scan- Selection of most cost-effective index- Specification of search range (start-, stop condition)
3. k-d scan- Evaluation of multi-dimensional search criteria- Use of differing evaluation directions by navigation
4. TID algorithm- Evaluation of all “useable" index structures
© 2011 AG DBIS
Set operations
11-4
Evaluation of all useable index structures- Location of TID lists of variable lengths - Boolean connection of the lists- Access to the records according to the hit list (result list)
Further plan operators in combination with selection• Sorting• Grouping (see sort operator)• Special operators e.g. in Data-Warehouse applications for grouping and
aggregation (CUBE operator)
Realizationof DBS
Plan operators
Table operations
Operators Across Several Tables
SQL allows complex queries across k tables• One-variable expressions:
describe conditions for the selection of elements from a table• Two-variable expressions:
describe conditions for the combination of elements from two tables• Typically, k-variable expressions are decomposed into one- and two-variable
i d l t d b di l t
Joins on type-spanning paths
Nested-loops & sort/merge join
Hash join
Distributed joins
expressions and evaluated by corresponding plan operators
Plan operators across several tables• General ways for the evaluation:- Nested iteration
for each element of outer table Totraversal of inner table Ti
• O(No · Ni + No)• important application: nested-loops join
Merge method
© 2011 AG DBIS
Set operations
11-5
- Merge methoditerating traversals through T1, T2
• O(N1 + N2)• additional sort costs, if necessary• important application: merging join
- HashingPartitioning of inner table Ti and partition-wise loadingin HT in memory. “Probing” by outer table To or itsresp. partitions using HT: O(p · No + Ni)
Realizationof DBS
Plan operators
Table operations
Operators Across Several Tables (2)
n-way joins• Decomposition into n-1 two-way joins2
• Number of possible join sequences is dependent on the join attributes chosen • Maximal n! different sequences possible• Use of pipelining techniques
O i l l i d d
Joins on type-spanning paths
Nested-loops & sort/merge join
Hash join
Distributed joins
• Optimal evaluation sequence dependent on - Plan operators - “Fitting” sort orders for join attributes - Size of operands etc.
Some join sequences using two-way joins (n=5)result result
result
© 2011 AG DBIS
Set operations
11-62. Practicality test (Guy Lohman test for join techniques): Does a new technique apply to joining three inputs without interrupting data flow between the join operators?
Analogous proceeding in case of set operations
T5
T4
T3
T1 T2
left-deep tree
T2
T4
T5
T1T3
right-deep treeT1 T2 T3 T4
T5
bushy tree
Realizationof DBS
Plan operators
Table operations
Plan Operators for the Join Join
• Record-type-spanning operation: usually very expensive• Frequent use: important optimization candidate• Typical application: equi-join• General Θ-join infrequent
Implementation of the join operation
Joins on type-spanning paths
Nested-loops & sort/merge join
Hash join
Distributed joins
p j pcan process, at the same time, selections (and projections) on the participating tables R and S
SELECT *FROM R, SWHERE R.JA Θ S.JA
AND PRAND PS
• JA: join attribute• PR and PS: predicates defined on selection attributes (SA) of R and S
Possible access paths
© 2011 AG DBIS
Set operations
11-7
p• Scans over R and S (always)
• Scans over IR(JA), IS(JA) (if present) deliver sort sequence according to JA
• Scans over IR(SA), IS(SA) (if present) if necessary, fast selection for PR and PS
• Scans over other index structures (if present) if necessary, faster location of all records
Realizationof DBS
Plan operators
Table operations
Nested-Loops Join Assumptions
• Records in R and S are not ordered according to join attributes • Index structures IR(JA) and IS(JA) do not exist
Algorithm for Θ-joinScan over S, for each record s, if PS:
R
Joins on type-spanning paths
Nested-loops & sort/merge join
Hash join
Distributed joins
scan over R,for each record r, if PR AND (r.JA Θ s.JA):
execute join, i.e., write combined record (r, s) into the result set.
Complexity: O(N*M) Nested-loops join using index access
Scan over S, for each record s, if PS:
determine via access to IR(JA) all TIDs for records satisfying r.JA = s.JA, for each TID:
fetch record r, if PR: b d d ( ) h l
© 2011 AG DBIS
Set operations
11-8
Rwrite combined record (r, s ) into the result set.
Nested-block joinScan over S, for each page (resp. set of contiguous pages) of S:
scan over R,for each page (resp. set of contiguous pages) of R:
for each record s of the S-page, if PS:for each record r of the R-page,
if PR AND (r.JA Θ s.JA): write combined record (r, s) into the result set.
Realizationof DBS
Plan operators
Table operations
Sort-Merge Join Algorithm consists of 2 phases
• Phase 1: Sorting of R and S w.r.t R(JA) and S(JA) (if not already present);in doing so, early elimination of records not needed ( PR, PS)
• Phase 2: Iterating scans over sorted R- and S-recordswhere join is performed in case of r.JA = s.JA
Complexity: O(N log N)
Joins on type-spanning paths
Nested-loops & sort/merge join
Hash join
Distributed joins
Complexity: O(N log N) Special case
If either IR(JA) and IS(JA) or GAPS over R(JA) and S(JA) (join index) is present:
exploitation of index structures on join attributesIterating scans over IR(JA) and IS(JA):
for each with two keys from IR(JA) and IS(JA), if r.JA = s.JA:fetch the records using the related TIDs,
if PR and PS: write combined record (r s) into the result set
© 2011 AG DBIS
Set operations
11-9
write combined record (r, s) into the result set
Realizationof DBS
Plan operators
Table operations
Hash Join
Simplest case (classic hashing)• Step 1: Partitioned read of (smaller) table R and construction of a hash
table using hH(r(JA)) w.r.t. values of R(JA) of partitions Ri (1 i p):each partition fits into the available memory and each record satisfies PR
• Step 2: Probing for records of S using PS; if successful, execution of join
Joins on type-spanning paths
Nested-loops & sort/merge join
Hash join
Distributed joins
S
• Step 3: Repeat steps 1 and 2 as long as R is exhausted
Construction of hash tables and probingScan over R; building hash tables Hi (1 i p) one at a time in memory
R
SScan over S with probing of
H1
H1
© 2011 AG DBIS
Set operations
11-10
Complexity: O(p · N) Special case
R fits into memory: one partition (p = 1) a single scan over S is sufficient!
R
S Scan over S with probing of HP
. . .Hp
Realizationof DBS
Plan operators
Table operations
Hash Join (2)
#records /JA-value
Partitioning of R with hp(r(JA))
Joins on type-spanning paths
Nested-loops & sort/merge join
Hash join
Distributed joins
JA1000
JA’1
#records /
0.660.33
hp(r(JA))JA’-value
0
© 2011 AG DBIS
Set operations
11-11
R1 R2 R3
Realizationof DBS
Plan operators
Table operations
Hash Join (3)
Partitioning• Partitioning of R in subsets R1, R2, ..., Rp:
a record r of R is in Ri, if h(r) is in Hi
R
Joins on type-spanning paths
Nested-loops & sort/merge join
Hash join
Distributed joins
. . .
H1 H2 Hp
Why is this partitioning a critical operation?
© 2011 AG DBIS
Set operations
11-12
Which auxiliary operations may be required?
Is the use of a hash function needed for partitioning?
• Table S is partitioned with same function hP while evaluating PS
Realizationof DBS
Plan operators
Table operations
Hash Join (4) Variants of hash join are primarily distinguished by the kind of partitioning Partitioning technique in case of simple hash join
shown for construction and probing of H1
R
H
step 1:
Joins on type-spanning paths
Nested-loops & sort/merge join
Hash join
Distributed joins Simple hash join• Step 1: Execute scan on R (smaller table), evaluate PR and apply hash function hP to
H1
S
Rrest
Srest
step 2:
1. iteration
© 2011 AG DBIS
Set operations
11-13
p ( ), R pp y Peach qualified record r. Is hP(r(JA)) in the chosen range, insert record into H1. Otherwise, write r in an output buffer for a file Rrest for “pretermitted” r-records.
• Step 2: Execute scan on S, evaluate PS and apply hash function hP to each qualified record s. Is hP(s(JA)) in the chosen range, search a join counterpart (probing) in H1. If successful, form a join record and put it to the result. Otherwise, write s to an output buffer for a file Srest for “pretermitted” s-records.
• Step 3: Repeat step 1 and 2 using the so far “pretermitted” records on Hi as long as Rrest is exhausted. Here, evaluation of PR and PS is not required anymore.
Realizationof DBS
Plan operators
Table operations
Hash Join (5)
Grace join (grace join)• Partitioning of R and S takes place before join starts• Partitions Ri and Si are stored in temporary files on disk• Construction of Hi (having M pages) in memory with Ri and probing with Si
Joins on type-spanning paths
Nested-loops & sort/merge join
Hash join
Distributed joins
R1
S1Scan over S1 with probing of H1
H1
RP
SP
. . .
HP
© 2011 AG DBIS
Set operations
11-14
SPScan over SP with probing of HP
What is the minimal memory size required?
Realizationof DBS
Plan operators
Table operations
Hash Join (6)
Hybrid hash join• Optimization such that construction and probing of H1 is done
in parallel to partitioning
RScan
1)
Joins on type-spanning paths
Nested-loops & sort/merge join
Hash join
Distributed joins
R1 constructed in H1R2 R3 RP
a)memory
memory area:1 page each
S2 S3 SPimmediateprobing ofS1-records
b) S
© 2011 AG DBIS
Set operations
11-15
. . .
2) H2R2
S2
Scan
3)
as in case of Grace join
Scan
Realizationof DBS
Plan operators
Table operations
Hash Join - Example
Partitioninga) Partitioning of R with hP(r(JA))
J A1 0 00
# reco rd s /
# reco rd s /h p ( r (JA ))
J A -v a lu e
J A ‘-v a lu e
Joins on type-spanning paths
Nested-loops & sort/merge join
Hash join
Distributed joins
J A ‘10 .6 60 .3 3
R 1 R 2 R 3
0
b) Partitioning of S with hP(s(JA))
II. Join R1
S1 JA’: 0.0 – 0.33read, probing with of hH(s(JA))
H1
JA’: 0.0 – 0.33in memory with hH(r(JA))
1)
© 2011 AG DBIS
Set operations
11-16
R2
S2JA’: 0.34 – 0.66
H2
JA’: 0.34 – 0.66
2)
R3
S3JA’: 0.67 – 1.0
H3
JA’: 0.67 – 1.03)
Realizationof DBS
Plan operators
Table operations
Use of Type-Spanning Access Paths Join via link structures
• Use of hierarchical access paths for equi-joinScan over R (Owner table), for each record r, if PR:
Scan over related link structure LR-S(JA), for each record s if PS:
Joins on type-spanning paths
Nested-loops & sort/merge join
Hash join
Distributed joins
for each record s, if PS: write combined record (r, s ) into the result set.
Further methods• Join indexes which are built for certain Θ-joins
R S
TID TID
R S
TID TID
VIR:
R S
TID TID
VIS:
© 2011 AG DBIS
Set operations
11-17
TIDr2
TIDr1
TIDr2
TIDr2
TIDs4
TIDs3
TIDs2
TIDs6
TIDr1
TIDr2
TIDr2
TIDr2
TIDs3
TIDs2
TIDs4
TIDs6
Index for TIDRLogical view
TIDs2
TIDs3
TIDs4
TIDs6
TIDr2
TIDr1
TIDr2
TIDr2
Index for TIDS
Realizationof DBS
Plan operators
Table operations
Use of Type-Spanning Access Paths (2)
• Use of generalized access path structures (GAPS)
K53
Joins on type-spanning paths
Nested-loops & sort/merge join
Hash join
Distributed joins
K25 K36 K47 K58 K78 K88
. . .. . . . . .
TIDs for Dept TIDs forMgr
© 2011 AG DBIS
Set operations
11-18
K55 1 TID. . . 3 1 4 TID TID TID TID TID TID TID TID . . .
PRIOR NEXT TIDs for Emp TIDs forEquipment
optionalreference to
overflow page
Realizationof DBS
Plan operators
Table operations
Join Algorithms - Comparison
e21 e22 e23
e11
e12
e13...
input stream 2
t st
ream
1
e21 e22 e23
e11
e12
e13...
e21 e22 e23
e11
e12
e13...
Joins on type-spanning paths
Nested-loops & sort/merge join
Hash join
Distributed joins
Nested-loops join is always applicable, however, scanning of complete search space has to be taken into account.
Merge join needs lowest search costs, requires, however, sorted input streams. Index
inpu
t
(a) Nested-loops join (b) Merge join (c) Hash joinHash partitions
element comparison successful element comparison
© 2011 AG DBIS
Set operations
11-19
Merge join needs lowest search costs, requires, however, sorted input streams. Index structures on both join attributes satisfy this prerequisite. Otherwise, explicit sorting of both tables w.r.t. join attributes reduces cost advantage substantially. Nevertheless, sort-merge join can own additional advantages, if the result is required in sorted sequence and sorting of the large result is more expensive than sorting of two small result sets.
Hash join partitions search space. Fig. c assumes that the same hash function h is applied to tables R and S. The partition size of the (smaller) table is given by the available buffer size in memory. A reduction of the partition size, to approximate case b, causes higher preparation costs and is therefore not recommendable.
Realizationof DBS
Plan operators
Table operations
Join Algorithms in Distributed DBS Problem statement
• Query in node K, which requires a join between (sub-)table R at node KR and (sub-)table S at node KS
• Determination of processing node: K, KR or KS
Determination of evaluation strategy
Joins on type-spanning paths
Nested-loops & sort/merge join
Hash join
Distributed joins
• Send participating tables completely to a node and compute join locally (“ship whole”)- Minimal number of messages
- Very high transfer volumes
• Request for every join value in the first table related records from the second table (“fetch as needed“)
- Large number of messages
- Only relevant records are considered
• Trade-off solution: Semi-join resp. extensions such as Bit-vector join (hash filter join)
© 2011 AG DBIS
Set operations
11-20
Semi-join • Shipping of a list of JA values of R to node of S• Determination of join counterparts in S and returning them to node of R• Then join processing at node of R
Bit-vector join• Similar to Semi-join, only shipping of a bit vector (Bloom Filter) created using a hash
function • Returning a superset of join counterparts in S
Realizationof DBS
Plan operators
Table operations
Semi-Join and Bit-Vector Join
473964
MgrLocDnoHansAnna
4747
NameDno
PhoneAddressNameDnoD
FrankfurtDept
EmpMunich
return projections ofjoin counterpart records
ship the whole JA column
join
Joins on type-spanning paths
Nested-loops & sort/merge join
Hash join
Distributed joins
44
HansAnna
4747
692875
473964
Dno
find join counterparts
473964
MgrLocDNo PhoneAddress47
NameDnoFrankfurtDept
return the potential
+ joincheck
© 2011 AG DBIS
Set operations
11-219144
47
692875
PhoneAddressNameDnoEmp
return the potentialjoin candidates
00011001
00011001
create bit vector by hashing
ship bit vectorMunich
hashing of Dno values to find potential join candidates
Realizationof DBS
Plan operators
Table operations
Set Operations3
Which set operations are needed?
A B C
R S
R, S union-compatible input streamsA, B, C element sets
Joins on type-spanning paths
Nested-loops & sort/merge join
Hash join
Distributed joins
operation result matching in all attributes matching in one or several attributes
A difference (R-S) anti-semi-join (S, R)
B intersection join, semi-join (S, R)
C difference (S-R) anti-semi-join (R, S)
A, B left-sided outer join
© 2011 AG DBIS
Set operations
11-22
A, C anti-difference anti-join
B, C right-sided outer join
A, B, C union symmetrical outer join
Which algorithms can be used for these set operations?• What has to be compared at a time?• How can a relationship to the join algorithms be found?
3. Graefe, G.: Query evaluation techniques for large databases, ACM Computing Surveys 25:2, 1993, pp. 73-170
Realizationof DBS
Plan operators
Table operations
Set Operations (2) Binary matching operations
• Solve the same task, in principle: “one-to-one matching operations”• An input element contributes to the output dependent of its “match” with another input
element • Operations repeatedly require the same steps and, therefore, can be implemented using
the same algorithms• Set- and join operations are closely connected!
Joins on type-spanning paths
Nested-loops & sort/merge join
Hash join
Distributed joins
j p y
Same logical proceeding• Three element sets are formed from R and S: A, B, C• Elements in B fit together !• How can these three element sets be formed?
- Using nested iteration- Using merge method - Using hash method
Unified realization concept• Comparison of join- vs primary-key attributes
© 2011 AG DBIS
Set operations
11-23
• Comparison of join- vs. primary-key attributes• Commonality: records are grouped on the basis of attribute values• Some unary operations are possible with special measures
- Grouping and sorting enable simple duplicate elimination- In case of aggregation, an attribute value per group is determined- In case of join, grouping of potential join counterparts is cost-effective
(either in partitions or a sort order) - Using set operations, the element sets A, B, C can be found;
at the same time, duplicate elimination is possible
Realizationof DBS
Plan operators
Table operations
Summary Selection operations
• Existing access path types require tailor-made operations and efficient mapping• Combination of various access paths possible (TID algorithm)
General classes of evaluation methods for binary operations• Nested iteration
Joins on type-spanning paths
Nested-loops & sort/merge join
Hash join
Distributed joins
• Merge method• Hashing
Many options for processing of join operations• Nested-loops join • Sort-merge join • Hash join • And variations
Set operations
© 2011 AG DBIS
Set operations
11-24
Set operations • Use of the same algorithm classes, in principle• Variation of executing comparisons
Extensibility infrastructure in object-relational DBMS• Creation of user-defined functions and operators• Generalization: user-defined table operators with n input tables and m output tables