Download - Chapter 3 Distributed Database Design - yu.ac.krynucc.yu.ac.kr/~hrcho/Courses/DDB/Chap3.pdf3.1 Horizontal Fragmentation z Primary horizontal fragmentation z Derived horizontal fragmentation

�

Chapter 3

Distributed DatabaseDesign

Chapter 3 - 1

Table of Contents

z Alternative Design Strategies

z Distribution Design Issues

z Fragmentation

z Allocation

�

Chapter 3 - 2

1. Alternative Design Strategies

z Two major strategies

✔ Top-down approaches

✔ Bottom-up approaches

Chapter 3 - 3

5HTXLUHPHQW $QDO\VLV

6\VWHP 5HTXLUHPHQWV �REMHFWLYHV�

&RQFHSWXDO 'HVLJQ 9LHZ 'HVLJQ

*OREDO &RQFHSWXDO 6FKHPD $FFHVV ,QIRUPDWLRQ ([WHUQDO 6FKHPD 'HILQLWLRQV

'LVWULEXWLRQ 'HVLJQ

/RFDO &RQFHSWXDO 6FKHPD

3K\VLFDO 'HVLJQ

3K\VLFDO 6FKHPD

2EVHUYDWLRQ DQG 0RQLWRULQJ

XVHULQSXW

9LHZ «_

8VHU

,QSXW

)HHGEDFN )HHGEDFN

1.1 Top-Down Design Process

�

Chapter 3 - 4

Details of Design Process

z Requirement Analysis✔ Defines the environment of the system✔ Elicits both the data and processing needs of all potential

DB users

z System Requirements✔ Where the final system is expected to stand?✔ Performance, Reliability, Availability, Economics,

Flexibility

Chapter 3 - 5

Details of Design Process (Cont’d)

z Conceptual Design✔ Determines entity types and relationships among these

entities✔ Entity analysis:

– determines the entities, attributes, and relationships✔ Functional analysis:

– determines the fundamental functions with which the modeled enterprise is involved

✔ The process is identical to the centralized database design

�

Chapter 3 - 6

Details of Design Process (Cont’d)

z View Design✔ Defines the interfaces for end users✔ The conceptual schema can be interpreted as being an

integration of user views.

z Distribution Design✔ Designs the local conceptual schema by distributing the

entities over the sites of the distributed system✔ Consists of two steps :

fragmentation and allocation

Chapter 3 - 7

1.2 Bottom-Up Design Process

z Top-Down Approach:

Suitable when a system is being designed from scratch

z Bottom-Up Approach :

Suitable when many DBs exist, and the design task involves integrating them into one DB

� The bottom-up design process consists of integratinglocal schemas into the global conceptual schema.

� Schema Translation & Schema Integrating

� In the context of Heterogeneous Database !

�

Chapter 3 - 8

2. Distribution Design Issues

z Why fragment at all?

z How should we fragment?

z How much should we fragment?

z Is there any way to test the correctness of decomposition?

z How should we allocate?

z What is necessary information for fragmentation and allocation?

Chapter 3 - 9

2.1 Reasons for Fragmentation

z A relation is not an appropriate unit of distribution.✔ Application views are usually subsets of relations.✔ Unnecessarily high volume of remote data access or

unnecessary replication✔ Not support intra-query concurrency� decompose a relation into fragments

z Disadvantages of fragmentation✔ applications defined on more than one fragments:

performance degradation by union or join✔ semantic data control :

integrity checking is very difficult

�

Chapter 3 - 10

(12 (1$0( 7,7/(

(�

(�

(�

(�

(�

(�

(�

(�

-� 'RH

0� 6PLWK

$� /HH

-� 0LOOHU

%� &DVH\

/� &KX

5� 'DYLV

-� -RQHV

(OHFW� (QJ

6\VW� $QDO

0HFK� (QJ�

3URJUDPPHU

6\VW� $QDO�

(OHFW� (QJ�

0HFK� (QJ�

6\VW� $QDO�

(12 -12 5(63

(�

(�

(�

(�

(�

(�

(�

(�

(�

(�

-�

-�

-�

-�

-�

-�

-�

-�

-�

-�

0DQJHU

$QDO\VW

$QDO\VW

&RQVXOWDQW

(QJLQHHU

3URJUDPPHU

0DQDJHU

0DQDJHU

(QJLQHHU

0DQDJHU

'85

��

��

�

��

��

��

��

��

��

��

-12

-�

-�

-�

-�

-1$0(

,QVWUXPHQWDWLRQ

'DWDEDVH 'HYHORS�

&$'�&$0

0DLQWHQDQFH

%8'*(7

��

��

��

��

/2&

0RQWUHDO

1HZ <RUN

1HZ <RUN

3DULV

7,7/(

(OHFW� (QJ

6\VW� $QDO

0HFK� (QJ�

3URJUDPPHU

6$/

��

��

��

��

( *

6-

Chapter 3 - 11

2.2 Fragmentation Alternativez Horizontal Fragmentation or Vertical Fragmentation

-12

-�

-�

-1$0(

,QVWUXPHQWDWLRQ

'DWDEDVH 'HYHORS�

%8'*(7

��

��

/2&

0RQWUHDO

1HZ <RUN

-�

-12

-�

-�

-1$0(

&$'�&$0

0DLQWHQDQFH

%8'*(7

��

��

/2&

1HZ <RUN

3DULV

-�

Example of Horizontal Partitioning

�

Chapter 3 - 12

-12

-�

-�

-�

-�

%8'*(7

��

��

��

��

-�

-12

-�

-�

-�

-�

-1$0(

,QVWUXPHQWDWLRQ

'DWDEDVH 'HYHORS�

&$'�&$0

0DLQWHQDQFH

/2&

0RQWUHDO

1HZ <RUN

1HZ <RUN

3DULV

-�

Example of Vertical Partitioning

Chapter 3 - 13

2.3 Degree of Fragmentation

z Not to fragment at all: relation

z Fragment to the level of individual tuples orFragment to the level of individual attributes

z Suitable level of fragmentation?✔ Such a level can only be defined with respect to the

applications that run on the database.

� According to the value of application-specific parameters, individual fragments can be identified.

�

Chapter 3 - 14

2.4 Correctness Rules of Fragmentation

z Completeness✔ If a relation instance R is decomposed into fragments

R1, R2, . . ., Rn, each data item can be found in R can also be found in one or more of Rj’s.

z Reconstruction✔ If a relation instance R is decomposed into fragments

R1, R2, . . ., Rn, it should be possible to define a relational operator ∇ such that

R = ∇ Rj, ∀ Rj ∈ FR

z Disjointness✔ If a relation instance R is decomposed into fragments

R1, R2, . . ., Rn, and data item dj is in Rj, it is not any other fragment Rk ( k ≠ j ).

Chapter 3 - 15

2.5 Allocation Alternatives

z Comparison of Replication Alternatives

Full Replication Partial Replication Partitioning

QueryProcessingDirectoryManagementConcurrencyControl

Reliability

Reality

Easy

Easy ornonexistent

Moderate

Very high

Possible application

Difficult

Difficult

Difficult

High

Realistic

Difficult

Difficult

Easy

Low

Possible application

Chapter 3 - 16

2.6 Information Requirements

z Database Information

z Application Information

z Communication Network Information

z Computer System Information

Chapter 3 - 17

3. Fragmentation

z Design of Horizontal Fragmentation

z Design of Vertical Fragmentation

z Design of Hybrid Fragmentation

�

Chapter 3 - 18

3.1 Horizontal Fragmentationz Primary horizontal fragmentation

z Derived horizontal fragmentation

z Information requirements of horizontal fragmentation✔ Database Information✔ Application Information

Chapter 3 - 19

Database Information

z Concerns the global conceptual schema✔ How the DB relations are connected to one another,

especially with joins?✔ Expression of relationships among relations using links

z Example

7,7/(� 6$/

6

(12� (1$0(� 7,7/(

(

-12� -1$0(� %8'*(7� /2&

-

(12� -12� 5(63� '85

*

-RLQ UHODWLRQVKLSÆ &r zV

�þ� 6ö 6�ò 6�R zV�³¦"

��

Chapter 3 - 20

Application Informationz Predicates used in user queries

✔ The most active 20% of user queries account for 80% of the total data access.

z Simple Predicate✔ pj : Aj θ Value, θ ∈ {=, <, ≠, ≤, >, ≥}✔ Pri : set of all simple predicates defined on relation Ri

z Minterm Predicates✔ mi : the conjunction of simple predicates✔ Mi : the set of minterm predicates for relation Ri

Mi = {mij | mij = ∧ pik*}, 1 ≤ k ≤ m, 1 ≤ j ≤ z

where pik ∈ Pri and (pik* = pik or ¬pik)

Chapter 3 - 21

Example: Consider Relation ‘S’

S�� 7,7/( ³(OHFW�(QJ´

S�� 7,7/( ³6\VW�$QDO´

S�� 7,7/( ³0HFK�(QJ´

S�� 7,7/( ³3URJUDPPHU´

S�� 6$/ ≤ ��

S�� 6$/ > ��

P�� 7,7/( ³(OHFW�(QJ´ ∧ 6$/ ≤ ��

P�� 7,7/( ³(OHFW�(QJ´ ∧ 6$/ > ��

P�� ¬ �7,7/( ³(OHFW�(QJ´� ∧ 6$/ ≤ ��

P�� ¬ �7,7/( ³(OHFW�(QJ´� ∧ 6$/ > ��

P�� 7,7/( ³3URJUDPPHU´ ∧ 6$/ ≤ ��

P�� 7,7/( ³3URJUDPPHU´ ∧ 6$/ > ��

��

Chapter 3 - 22

Application Information (��)

z Quantitative Information✔ Minterm selectivity: sel(mi)

– Number of tuples of relations that would be accessed by a user query specified according to a given minterm

✔ Access frequency: acc(qi)– Frequency with which user application access data

Chapter 3 - 23

Primary Horizontal Fragmentation

z Definition✔ A fragmentation generated by a selection operation on

the owner relation of a database schema✔ Given relation Ri, its horizontal fragments are

Rij = σFj(Ri), 1 ≤ j ≤ w, Fj : the selection formula (mij)

z Example : Sample Relation �� σ/2& ³0RQWUHDO´ �� σ/2& ³1HZ <RUN´ �� σ/2& ³3DULV´ ��

��

Chapter 3 - 24

-12

-�

-1$0(

,QVWUXPHQWDWLRQ

%8'*(7

��

/2&

0RQWUHDO

-�

-12

-�

-1$0(

'DWDEDVH 'HYHORS�

%8'*(7

��

/2&

1HZ <RUN

-�

-� &$'�&$0 �� 1HZ <RUN

-12

-�

-1$0(

0DLQWHQDQFH

%8'*(7

��

/2&

3DULV

-�

Example: Minterm Fragments

Chapter 3 - 25

Simple Predicate� ��

z Completeness✔ Pr� simple predicate�� fragment��

fragment��

��, Pr is complete!✔ Example

± 3U ^ /2& ³0RQWUHDO´� /2& ³1HZ <RUN´� /2& ³3DULV´ `

± 3U¶ 3U ∪ ^ %8'*(7 ≤ �� %8'*(7 > �� `

z Minimality✔ Pr�� fragment F� F1� F2��, F1� F2�

��

✔ Example± 3U¶¶ 3U¶ ∪ ^ -1$0( ³,QVWUXPHQWDWLRQ´ `

��

Chapter 3 - 26

$OJRULWKP &20B0,1LQSXW � 5 � UHODWLRQ� 3U � VHW RI VLPSOH SUHGLFDWHV

RXWSXW � 3U¶ � VHW RI VLPSOH SUHGLFDWHV

GHFODUH � ) � VHW RI PLQWHUP IUDJPHQWV

EHJLQ

ILQG D SL ∈ 3U VXFK WKDW SL SDUWLWLRQV 5 DFFRUGLQJ WR 3B5XOH

3U¶ SL

3U 3U�SL

) IL ^ IL LV WKH PLQWHUP IUDJPHQW DFFRUGLQJ WR SL `

GR

EHJLQ

ILQG D SM ∈ 3U VXFK WKDW SM SDUWLWLRQV VRPH IN RI 3U¶

DFFRUGLQJ WR 3B5XOH

3U¶ 3U¶ ∪ SM

3U 3U ± SM

) ) ∪ IMHQG�EHJLQ

XQWLO 3U¶ LV FRPSOHWH

HQG� ^&20B0,1`

P_Rule : fundamental rule of completeness and minimality, which statesthat a fragment is partitioned “into at least two parts which are accesseddifferently by at least one application.”

Chapter 3 - 27

$OJRULWKP 3+25,=217$/LQSXW � 5L � UHODWLRQ� 3UL � VHW RI VLPSOH SUHGLFDWHV

RXWSXW � 0L � VHW RI PLQWHUP IUDJPHQWV

EHJLQ

3U¶ &20B0,1�5L� 3UL�

GHWHUPLQH WKH VHW 0L RI PLQWHUP SUHGLFDWHV

GHWHUPLQH WKH VHW ,L RI LPSOLFDWLRQV DPRQJ SL ∈ 3UL¶

IRU HDFK PL ∈ 0L GR

LI PL LV FRQWUDGLFWRU\ DFFRUGLQJ WR , WKHQ

0L 0L ± PL

HQG�LI

HQG�IRU

HQG� ^3+25,=217$/`

Example:S�� DWW YDOXH� S

�� DWW YDOXH�

, L�� DWW YDOXH�� ⇒ ¤�DWW YDOXH��

L�� DWW YDOXH�� ⇒ ¤�DWW YDOXH��

0 P�� DWW YDOXH�� ∧ �DWW YDOXH�� contradictory by I

P�� DWW YDOXH�� ∧ ¤�DWW YDOXH��

P�� ¤�DWW YDOXH�� ∧ �DWW YDOXH��

P�� ¤�DWW YDOXH�� ∧ ¤�DWW YDOXH�� contradictory by I

��

Chapter 3 - 28

Examplez �, �: subject of primary horizontal fragmentation

z Assumption for �✔ There is only 1 application that accesses �.✔ That application checks the salary information.✔ Queries for � are issued at two sites.

z Simple PredicatesS�� 6$/ ≤ ��

S�� 6$/ > ��

⇒ 3U ^S

�� S

�` : complete and minimal by COM_MIN

Chapter 3 - 29

Example (Cont’d)

z Minterm PredicatesP

�� 6$/ ≤ �� ∧ �6$/ > ��

P�� 6$/ ≤ �� ∧ ¬ �6$/ > ��

P�� ¬ �6$/ ≤ �� ∧ �6$/ > ��

P�� ¬ �6$/ ≤ �� ∧ ¬ �6$/ > ��

⇒ m1 and m4 is contradictory : � � ��

Therefore, we define two fragments, Fs = {S1, S2} according to M.

7,7/(

0HFK� (QJ�

3URJUDPPHU

6$/

��

��

6�

7,7/(

(OHFW� (QJ

6\VW� $QDO

6$/

��

��

6�

��

Chapter 3 - 30

Example (Cont’d)z Assumption for �

✔ There are 2 applications that access �.✔ The first is issued at three sites and finds the names and

budgets of projects given their number.✔ The second is issued at two sites and has to do with the

management of the projects.

z Simple PredicatesS�� /2& ³0RQWUHDO´

S�� /2& ³1HZ <RUN´

S�� /2& ³3DULV´

S�� %8'*(7 ≤ ��

S�� %8'*(7 > ��

⇒ 3U¶ ^S�� S�� S�� S�� S�` is complete and minimal : COM_MIN

Chapter 3 - 31

Example (Cont’d)

z Minterm PredicatesP� � /2& ³0RQWUHDO´ ∧ �%8'*(7 ≤ ��P� � /2& ³0RQWUHDO´ ∧ �%8'*(7 > ��P� � /2& ³1HZ <RUN´ ∧ �%8'*(7 ≤ ��P� � /2& ³1HZ <RUN´ ∧ �%8'*(7 > ��P� � /2& ³3DULV´ ∧ �%8'*(7 ≤ ��P� � /2& ³3DULV´ ∧ �%8'*(7 > ��

Therefore, we define six fragments,�- � �� according to M.

��

Chapter 3 - 32

Derived Horizontal Fragmentation

z ��

✔ Defined on a member relation of a link according to a selection operation on its owner.

✔ Given a link L, owner(L) = S & member(L) = R– Ri = R semi_join Si, 1 ≤ i ≤ w– w : # of fragments that will be defined on R– Si = a primary horizontal fragment for S

z ExampleL1 : owner(L1) = S and member(L1) = EE1 : E semi_join S1, where S1 = σSAL ≤ 30000(S)E2 : E semi_join S2, where S2 = σSAL > 30000(S)

Chapter 3 - 33

Potential Complicationz ��

✔ When there are more than two links into a relation, there is more than one possible horizontal fragmentation of the relation.

z Two criteria✔ Fragmentation used in more applications✔ Fragmentation with better join characteristics

z Recall the advantages of the fragmentation✔ Performing a query on smaller relations✔ Performing joins in a distributed fashion

z Simple Graph✔ A graph with only one link coming in or going out of a fragment.✔ Effects of storage and join performance!

��

Chapter 3 - 34

Example : Fragmentation of �z Assumption

✔ There are two applications which access *.✔ One finds the names of engineers who work at certain places.✔ The other accesses the project that employees work on and how

long they will work on those projects.

z The first fragmentation according to ��, ��, ��✔ *� * VHPLBMRLQ -�� ZKHUH -� � σ/2& ³0RQWUHDO´�-�✔ *� * VHPLBMRLQ -�� ZKHUH -� � σ/2& ³1HZ <RUN´�-�✔ *� * VHPLBMRLQ -�� ZKHUH -� � σ/2& ³3DULV´�-�

z The second fragmentation according to ��, ��✔ *� * VHPLBMRLQ (�✔ *� * VHPLBMRLQ (�

The final choice of the fragmentation scheme may be a decision problem addressed during allocation.

Chapter 3 - 35

Checking for Correctnessz Completeness

✔ PHF: A set of complete and minimal predicates, ��✔ DHF: Ensures referential integrity

z Reconstruction✔ for a relation R with fragments �5 � �� :�

� � ∪ �L� ∀ �L ∈ �5

z Disjointness✔ PHF: Minterm predicates determining the

fragmentation are mutually exclusive✔ DHF: Disjointness can be guaranteed if the join graph

is simple; otherwise investigate actual tuple values

�

Chapter 3 - 36

3.2 Vertical Fragmentation

z DefinitionPartitions R to fragments R1, R2, …, Rr, each which contains a subset of R’s attribute as well as the primary key of R

z Inherently more complicated than horizontal partitioning✔ Total number of alternatives✔ Obtaining optimal solution is very difficult✔ Resort to heuristics

Chapter 3 - 37

Two Types of Heuristics

z Grouping✔ Starts by assigning each attribute to one fragment✔ Joins some of fragments until some criteria is satisfied.

z Splitting✔ Starts with a relation and decides on beneficial partitioning✔ Top-down design methodology��

z Note– Replication of the key in the fragments– Therefore, splitting is considered only for those attributes that do

not participate in the primary key.

�

Chapter 3 - 38

Information Requirements of Vertical Fragmentationz What needs to be determined about applications?

✔ Affinity of attributes: How closely related the attributes are?✔ Attribute usage value: use(qi, Aj) = 1 or 0

z ExampleT� � 6(/(&7 %8'*(7 )520 - :+(5( -12 9DOXH�

T� � 6(/(&7 -1$0(� %8'*(7 )520 -�

T� � 6(/(&7 -1$0( )520 - :+(5( /2& 9DOXH�

T� � 6(/(&7 680�%8'*(7� )520 - :+(5( /2& 9DOXH�

$� $� $� $�T� � � � �

T� � � � �

T� � � � �

T� � � � �

$IILQLW\ 0DWUL[$� -12

$� -1$0(

$� %8'*(7

$� /2&

Chapter 3 - 39

Attribute Affinityz Attribute Affinity

refl(qk) : # of accesses to attributes (Ai, Aj) for each execution of application qk at site Sl

accl(qk) : the application access frequency measure

∑ ∑=∧= ∀

=1),( 1),(|

ji )()()A ,aff(Ajkik lAquseAqusek S

klkl qaccqref

��

Chapter 3 - 40

Example– Assume that refl(qk) = 1 for all qk and Sl

– Application frequenciesacc1(q1) = 15 acc2(q1) = 20 acc3(q1) = 10acc1(q2) = 5 acc2(q2) = 0 acc3(q2) = 0acc1(q3) = 25 acc2(q3) = 25 acc3(q3) = 25acc1(q4) = 3 acc2(q4) = 0 acc3(q4) = 0

∑ ∑= =

=++==1

1

3

113121131 45)(qacc)(qacc)(qacc )()A ,aff(A

k lkl qacc

attribute affinity matrix (AA)

A1 A2 A3 A4

A1 45 0 45 0

A2 0 80 5 75

A3 45 5 53 3

A4 0 75 3 78

Chapter 3 - 41

Clustering Algorithmz ��

✔ Find some means of grouping the attributes of a relation based on the attribute affinity values in AA.

✔ Net contribution to the global affinity measure of placing attribute Ak

between Ai and Aj ;± FRQW�$L� $N� $M� ERQG�$L� $N� � ERQG�$N� $M� ± ERQG�$L� $M�

z Examplecont(A1, A4, A2) = bond(A1, A4) + bond(A4, A2) – bond(A1, A2)bond(A1, A4) = 45 × 0 + 0 × 75 + 45 × 3 + 0 × 78 = 135bond(A4, A2) = 11865 bond(A1, A2) = 225

Therefore, cont(A1, A4, A2) = 135 + 11865 – 225 = 11775

∑=

=n

zyzxz AAaffAAaff

1yx ),(),( )A ,bond(A where

��

Chapter 3 - 42

$OJRULWKP &/867(5,1*

LQSXW � $$ � DWWULEXWH DIILQLW\ PDWUL[

RXWSXW � &$ � FOXVWHUHG DIILQLW\ PDWUL[

EHJLQ

^ LQLWLDOL]H� UHPHPEHU WKDW $$ LV DQ Q × Q PDWUL[ `

&$�� $$��

&$�� $$��

LQGH[ ��

ZKLOH LQGH[ ≤ Q GR ^FKRRVH WKH ³EHVW´ ORFDWLRQ IRU DWWULEXWH $$LQGH[ `EHJLQ

IRU L IURP � WR LQGH[ ± � E\ � GR

FDOFXODWH FRQW�$L ± �� $LQGH[� $L�

HQG�IRU

FDOFXODWH FRQW�$LQGH[ ± �� $LQGH[� $LQGH[ � �� ^ERXQGDU\ FRQG�`

ORF SODFHPHQW JLYHQ E\ PD[LPXP FRQW YDOXH

IRU M IURP LQGH[ WR ORF E\ ± � GR

&$�� M� &$�� M ± ��

HQG�IRU

&$�� ORF� $$�� LQGH[�

LQGH[ LQGH[ � �

HQG�ZKLOH

RUGHU WKH URZV DFFRUGLQJ WR WKH UHODWLYH RUGHULQJ RI FROXPQV

HQG� ^&/867(5,1*`

Chapter 3 - 43

ExampleOrdering(0-3-1) :

cont(A0, A3, A1) = bond(A0, A3) + bond(A3, A1) – bond(A0, A1)bond(A0, A1) = bond(A0, A3) = 0bond(A3, A1) = 45 × 45 + 5 × 0 + 53 × 45 + 3 × 0 = 4410

cont(A0, A3, A1) = 4410

Ordering(1-3-2) :cont(A1, A3, A2) = bond(A1, A3) + bond(A3, A2) – bond(A1, A2)

bond(A1, A3) = bond(A3, A1) = 4410bond(A3, A2) = 890, bond(A1, A2) = 225

cont(A1, A3, A2) = 5525

��

Chapter 3 - 44

Ordering(2-3-4) :cont(A2, A3, A4) = bond(A2, A3) + bond(A3, A4) – bond(A2, A4)

bond(A2, A3) = 890bond(A3, A4) = bond(A2, A4) = 0

cont(A1, A3, A2) = 890

And so forth … : The resulting Clustered Affinity Matrix (CA)

A1 A3 A2 A4

A1 45 45 0 0

A3 45 53 5 3

A2 0 5 80 75

A4 0 3 75 78

Chapter 3 - 45

Partitioning Algorithm

z The upper left-hand corner of CA : TA

z The lower right-hand corner of CA : BA

AQ(qi) = { Aj | use(qi, Aj) = 1 }TQ = { qi | AQ(qi) ⊆ TA }BQ = { qi | AQ(qi) ⊆ BA }OQ = Q – { TQ ∪ BQ }

��

Chapter 3 - 46

∑ ∑∈ ∀

=Qq S

ijij

i j

qaccqrefCQ )()(

∑ ∑∈ ∀

=TQq S

ijij

i j

qaccqrefCTQ )()(

∑ ∑∈ ∀

=BQq S

ijij

i j

qaccqrefCBQ )()(

∑ ∑∈ ∀

=OQq S

ijij

i j

qaccqrefCOQ )()(

To find the point x such that z is maximized :z = CTQ × CBQ – COQ2

Chapter 3 - 47

$OJRULWKP 3$57,7,21

LQSXW � &$ � FOXVWHUHG DIILQLW\ PDWUL[ � 5 � UHODWLRQ

RXWSXW � ) � VHW RI IUDJPHQWV

EHJLQ

^ GHWHUPLQH WKH ] YDOXH IRU WKH ILUVW FROXPQ `

^ WKH VXEVFULSWV LQ WKH FRVW HTXDWLRQV LQGLFDWH WKH VSOLW SRLQW `

FDOFXODWH &74Q ± � � &%4Q ± � � &24Q ± �

EHVW &74Q ± � × &%4Q ± � ± �&24Q ± ��

GR

IRU L IURP Q ± � WR � E\ ± � GR

FDOFXODWH &74L � &%4L � &24L

] &74L × &%4L ± &24L�

LI �] > EHVW� WKHQDVVLJQ EHVW WR ] DQG UHFRUG WKH VKLIW SRVLWLRQ

HQG�LI

HQG�IRU

FDOO 6+,)7�&$�

XQWLO QR PRUH 6+,)7 LV SRVVLEOH

UHFRQVWUXFW WKH PDWUL[ DFFRUGLQJ WR WKH VKLIW SRVLWLRQ

5� Π7$�5� ∪ . ^ . LV WKH VHW RI SULPDU\ NH\ DWWULEXWHV RI 5 `

5� Π%$�5� ∪ .

) ^5�� 5�`

HQG� ^3$57,7,21`

��

Chapter 3 - 48

Checking for Correctness

z CompletenessA = TA ∪ BA

z ReconstructionR = JOINK Ri, ∀ Ri ∈ FR

z Disjointnessnot important as horizontal fragmentation due to the replication of primary key

Chapter 3 - 49

3.3 Hybrid Fragmentation

z The levels of nesting in most practical applications do not exceed 2.

R

H H

R2R1

V V V V V

R11 R12 R21 R22 R23

��

Chapter 3 - 50

Correctness of Hybrid Fragmentation

z Reconstruction✔ Starts at the leaves of the partitioning tree and moves

upward by performing joins and unions

� Completeness✔ Fragmentation is complete if the intermediate and leaf

fragments are complete.

� Disjointness✔ Fragmentation is disjoint if the intermediate and leaf

fragments are disjoint.

Chapter 3 - 51

4. Allocationz Definition

The allocation problem involves finding the optimaldistribution of relations (fragments) to sites.

z Measures of optimality✔ Minimal cost :

– cost of storing, querying, updating, and data communication

✔ Performance :– to minimize the response time and– to maximize the system throughput at each site

��

Chapter 3 - 52

4.1 Some example of data placementand allocation

(Example 1) Single-relation case

SDJH �

�� S�

�� T�

�� U�

�� V�

�� T�

�� U�

�� V�

�� S�

�� U�

�� V�

�� S�

�� T�

�� V�

�� S�

�� T�

�� U�

SDJH � SDJH � SDJH � SDJH �

�� S�

�� T�

�� S�

�� T�

�� U�

�� V�

�� U�

�� V�

�� S�

�� T�

�� S�

�� T�

�� U�

�� V�

�� U�

�� V�

SDJH � SDJH � SDJH �

Table 1 : Table 2 :

Query : { (1, *)?, (2, *)?, …, (*, p)?, (*, q)?, … }

z Distributed placement– site 1 : page(1, 4), site 2 : pages(2, 3)– site 1 : page(1, 2), site 2 : pages(3, 4)– site 1 : page(1, 3), site 2 : pages(2, 4)

Chapter 3 - 53

(Example 2) Multiple-relation case

5�

�D� S�

�E� T�

�F� U�

�G� V�

�S� ��

�T� ��

�U� ��

�V� ��

�D� T�

�E� U�

�F� V�

�G� S�

�S� ��

�T� ��

�U� ��

�V� ��

�D� U�

�E� V�

�F� S�

�G� T�

�S� ��

�T� ��

�U� ��

�V� ��

�D� V�

�E� S�

�F� T�

�G� U�

�S� ��

�T� ��

�U� ��

�V� ��


5� 5� 5� 5� 5� 5� 5�

5�

�D� S�

�E� S�

�D� T�

�E� T�

�S� ��

�S� ��

�T� ��

�T� ��

�F� S�

�G� S�

�F� T�

�G� T�

�S� ��

�S� ��

�T� ��

�T� ��

�D� U�

�E� U�

�D� V�

�E� V�

�U� ��

�U� ��

�V� ��

�V� ��

�F� U�

�G� U�

�F� V�

�G� V�

�U� ��

�U� ��

�V� ��

�V� ��


5� 5� 5� 5� 5� 5� 5�

σcol1 = ‘a’(R1) JOINcol2 = col1 σcol2 = 1(R2)

z Distributed placement– site 1 : page(1, 2), site 2 : pages(3, 4)– site 1 : page(1, 3), site 2 : pages(2, 4)

��

Chapter 3 - 54

4.2 A practical combinatorial optimizationapproach to the file allocation problem

z Assumption

✔ Most files are not fragmented.✔ It is unlikely that we will try to exploit parallelism in our file

allocation.✔ Each computing facilities have tight limits on their local mass

storage capacity.✔ Storage is considered to be a constant on the optimization, rather

than as a cost.✔ The transaction traffic is known in advance.✔ Reads and updates have equal costs.✔ Remote accesses all have the same unit cost.✔ No redundancy is permitted and fragmentation of file is forbidden.

Chapter 3 - 55

Notation and Constraintsz N nodes, indexed by j, capacity = cj

z M files, indexed by i, size = si

z T transactions, indexed by k, frequency from node j = fkj

z nki accesses from transaction k to file i

z xij : decision variable 1 – file i is allocated to node j, 0 – otherwise

The goal of FAP = maximize(Σi,j Xij Vij)where Vij = Σk fkj (nki) × cost of local retrievals

∑ ≤≤∀=j

ij Miix 1|,1

∑ ≤≤∀≤i

jiij Njjcsx 1|,

�

Chapter 3 - 56

Algorithm FAP

1. Calculate J(i) = { j’ | Vij = max Vij}, 1 ≤ j ≤ N

2. An optimal set of xkj is given by xij = 1 for some j ∈ J(i) and xij = 0 otherwise.

3. If this solution is feasible (i.e. meets the constraints), it is our answer; go to step 7

4. Otherwise, identify all nodes which cause the constraints to be broken.

5. For every such over-subscribed node, solve the corresponding knapsack problem, thereby eliminating a node and the files allocated to that node from further consideration.

6. Consider J(i) for any nodes j which remain. If there are such nodes go to step 2

7. Otherwise, we have finished.

Chapter 3 - 57

� ��

��

��

�

��

�

Access rates of transactions to files (nki)

��

��

��

��

��

��

��

��

��

� ��

Allocate 8 files among five sites, each with 20 MB disk.

��

�

�

�

��

�

�

�

�

�

�

�

�

�

��

� �

�

)LOH � VL]H�0E\WHV �7UDQVDFWLRQV

�

�

�

�

�

�

�

�

�

��

Example

�

Chapter 3 - 58

�

��

��

�

��

�

��

�

�

The frequency of transactions in sites (fkj)

�

�

� � � �

��

��

��

��

�

��

��

��

��

�

�

��

��

�

�

��

��

�

�

� ��

6LWHV7UDQVDFWLRQV

�

�

�

�

�

�

�

�

�

��

Chapter 3 - 59

� ��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

��

�

��

��

�

�

��

��

��

��

��

��

� ��

��

)LOH � VL]H�0E\WHV �6LWH M

�

�

�

�

�

1. J(i) are the yellow elements for each i.

2. If we assign xij = 1 for these and 0 for the other entries we have our first solution.

3. Site 1 has been allocated 55Mbytes of files. This is not a feasible solution.

4. Site 1 has been allocated too much.

5. The maximum value(Vij) we can get from storing any files on site 1 is obtained

by storing files 1, 2, and 8 there.

6. Our new Vij table is obtained by eliminating row 1 above and column 1, 2, and 8.

The new J(i) are the underlined entries ( all allocated to site 3 )

Vij table

��

Chapter 3 - 60

2’. Assign xij = 1 to these, xij = 0 to the remainder of the entries.

3’. Site 3 has been allocated 47 Mbytes.

4’. Site 3 has been overloaded.

5’. The maximum value we can get from storing files on site 3 is obtained

by storing files 4 and 5 there.

6’. Our new Vij table is obtained by eliminating row 3 and column 4 and

5 from the reduced table.

� ��

��

��

��

�

��

��

�

��

��

New Vij

� ��

)LOH � VL]H�0E\WHV �6LWH

�

�

�

The new j(i) are underlined ( all allocated to site 4 )

Chapter 3 - 61

2’’. Assign 1 to xij for these entries. 0 for the rest.

3’’. Site 4 has been allocated 29 Mbytes.

4’’. Site 4 has been overloaded.

5’’. Store file 3 at site 4.

6’’. Our new Vij table is obtained by eliminating row 2, column 1 from the table above.

Without spelling out the details, it is clear that the remaining 2 files, 6 and 7, are allocated to site 5.

So our solution is

6LWH

�

)LOH7RWDO VSDFH XVHG

�0E\WHV�

��

� �

� ��

� � ��

� ��

��

Chapter 3 - 62

4.3 Database Allocation Problem

z DAP is different from FAP✔ The relationship between fragments should be taken into

account.✔ The relationship between the allocation and query

processing should be properly modeled.✔ FAP do not take into consideration the cost of integrity

enforcement.✔ The cost of enforcing concurrency control mechanisms

should be considered.

Chapter 3 - 63

z There are no general heuristic models that take as input a set of fragments and produce a near-optimal allocation subject to the types of constraints discussed here.

z We present a relatively general model and thendiscuss a number of possible heuristics that might be employed to solve it.

��

Chapter 3 - 64

Information Requirements

z Database information– the selectivity of a fragment Fj with respect to query qi : seli(Fj)– the size of a fragment Fj : size(Fj) = card(Fj) × length(Fj)

z Application information– # of read (write) accesses from qi to Fj : RRij (URij)– UM with uij (1 or 0), RM with rij (1 or 0), and O with o(i)– for each query, a maximum allowable response time is defined

z Site information– for each site, its storage and processing capacity is defined– unit cost of storing data at site Sk : USCk

– the cost of processing one unit of work at site Sk : LPCk

z Network information– the communication cost per frame between Si and Sj : gij

– the size (in bytes) of one frame : fsize

Chapter 3 - 65

Allocation Modelz Objectives

Minimize(Total Cost) subject to response-time/storage/processing constraint

z xij = 1 if Fi is stored at Sj, and xij = 0 otherwise

z Total Cost

STCjk : the cost of fragment Fj at site Sk

STCjk = USCk × size(Fj) × xjk

QPCi : query processing cost of application qi

QPCi = processing cost (PCi) + transmission cost(TCi)

∑ ∑∑∈∀ ∈∀∈∀

+=SS FF

jkQq

i

k ji

STCQPCTOC

��

Chapter 3 - 66

)(CCcost CC )(IEcost t enforcemenintegrity )(ACcost access PC iiii ++=

∑ ∑∈∀ ∈∀

×××+×=SS FF

kjkijijijiji

k j

LPCxRRrURuAC )(

iii TCR TCU TC +=

∑ ∑ ∑ ∑∈∀ ∈∀ ∈∀ ∈∀

××+××=SS FF SS FF

iokjkijkiojkiji

k j k j

gxugxuTCU )(,),(

∑∈∀

∈ ×××=FF

iokji

jkijSSi

j

kg

fsize

FselxrminTCR )

)(( )(,

Chapter 3 - 67

Solution Methods

z The formulation of DAP is NP-complete.

z Thus, one has to look for heuristic methods that yieldsuboptimal solutions.

z Heuristic methods– knapsack problem solution– branch-and-bound– network flow algorithm

⇒ There is not enough data to determine how close the

results are to the optimal.