�
Chapter 3
Distributed DatabaseDesign
Chapter 3 - 1
Table of Contents
z Alternative Design Strategies
z Distribution Design Issues
z Fragmentation
z Allocation
�
Chapter 3 - 2
1. Alternative Design Strategies
z Two major strategies
✔ Top-down approaches
✔ Bottom-up approaches
Chapter 3 - 3
5HTXLUHPHQW $QDO\VLV
6\VWHP 5HTXLUHPHQWV �REMHFWLYHV�
&RQFHSWXDO 'HVLJQ 9LHZ 'HVLJQ
*OREDO &RQFHSWXDO 6FKHPD $FFHVV ,QIRUPDWLRQ ([WHUQDO 6FKHPD 'HILQLWLRQV
'LVWULEXWLRQ 'HVLJQ
/RFDO &RQFHSWXDO 6FKHPD
3K\VLFDO 'HVLJQ
3K\VLFDO 6FKHPD
2EVHUYDWLRQ DQG 0RQLWRULQJ
XVHULQSXW
9LHZ «_
8VHU
,QSXW
)HHGEDFN )HHGEDFN
1.1 Top-Down Design Process
�
Chapter 3 - 4
Details of Design Process
z Requirement Analysis✔ Defines the environment of the system✔ Elicits both the data and processing needs of all potential
DB users
z System Requirements✔ Where the final system is expected to stand?✔ Performance, Reliability, Availability, Economics,
Flexibility
Chapter 3 - 5
Details of Design Process (Cont’d)
z Conceptual Design✔ Determines entity types and relationships among these
entities✔ Entity analysis:
– determines the entities, attributes, and relationships✔ Functional analysis:
– determines the fundamental functions with which the modeled enterprise is involved
✔ The process is identical to the centralized database design
�
Chapter 3 - 6
Details of Design Process (Cont’d)
z View Design✔ Defines the interfaces for end users✔ The conceptual schema can be interpreted as being an
integration of user views.
z Distribution Design✔ Designs the local conceptual schema by distributing the
entities over the sites of the distributed system✔ Consists of two steps :
fragmentation and allocation
Chapter 3 - 7
1.2 Bottom-Up Design Process
z Top-Down Approach:
Suitable when a system is being designed from scratch
z Bottom-Up Approach :
Suitable when many DBs exist, and the design task involves integrating them into one DB
� The bottom-up design process consists of integratinglocal schemas into the global conceptual schema.
� Schema Translation & Schema Integrating
� In the context of Heterogeneous Database !
�
Chapter 3 - 8
2. Distribution Design Issues
z Why fragment at all?
z How should we fragment?
z How much should we fragment?
z Is there any way to test the correctness of decomposition?
z How should we allocate?
z What is necessary information for fragmentation and allocation?
Chapter 3 - 9
2.1 Reasons for Fragmentation
z A relation is not an appropriate unit of distribution.✔ Application views are usually subsets of relations.✔ Unnecessarily high volume of remote data access or
unnecessary replication✔ Not support intra-query concurrency� decompose a relation into fragments
z Disadvantages of fragmentation✔ applications defined on more than one fragments:
performance degradation by union or join✔ semantic data control :
integrity checking is very difficult
�
Chapter 3 - 10
(12 (1$0( 7,7/(
(�
(�
(�
(�
(�
(�
(�
(�
-� 'RH
0� 6PLWK
$� /HH
-� 0LOOHU
%� &DVH\
/� &KX
5� 'DYLV
-� -RQHV
(OHFW� (QJ
6\VW� $QDO
0HFK� (QJ�
3URJUDPPHU
6\VW� $QDO�
(OHFW� (QJ�
0HFK� (QJ�
6\VW� $QDO�
(12 -12 5(63
(�
(�
(�
(�
(�
(�
(�
(�
(�
(�
-�
-�
-�
-�
-�
-�
-�
-�
-�
-�
0DQJHU
$QDO\VW
$QDO\VW
&RQVXOWDQW
(QJLQHHU
3URJUDPPHU
0DQDJHU
0DQDJHU
(QJLQHHU
0DQDJHU
'85
��
��
�
��
��
��
��
��
��
��
-12
-�
-�
-�
-�
-1$0(
,QVWUXPHQWDWLRQ
'DWDEDVH 'HYHORS�
&$'�&$0
0DLQWHQDQFH
%8'*(7
������
������
������
������
/2&
0RQWUHDO
1HZ <RUN
1HZ <RUN
3DULV
7,7/(
(OHFW� (QJ
6\VW� $QDO
0HFK� (QJ�
3URJUDPPHU
6$/
�����
�����
�����
�����
( *
6-
Chapter 3 - 11
2.2 Fragmentation Alternativez Horizontal Fragmentation or Vertical Fragmentation
-12
-�
-�
-1$0(
,QVWUXPHQWDWLRQ
'DWDEDVH 'HYHORS�
%8'*(7
������
������
/2&
0RQWUHDO
1HZ <RUN
-�
-12
-�
-�
-1$0(
&$'�&$0
0DLQWHQDQFH
%8'*(7
������
������
/2&
1HZ <RUN
3DULV
-�
Example of Horizontal Partitioning
�
Chapter 3 - 12
-12
-�
-�
-�
-�
%8'*(7
������
������
������
������
-�
-12
-�
-�
-�
-�
-1$0(
,QVWUXPHQWDWLRQ
'DWDEDVH 'HYHORS�
&$'�&$0
0DLQWHQDQFH
/2&
0RQWUHDO
1HZ <RUN
1HZ <RUN
3DULV
-�
Example of Vertical Partitioning
Chapter 3 - 13
2.3 Degree of Fragmentation
z Not to fragment at all: relation
z Fragment to the level of individual tuples orFragment to the level of individual attributes
z Suitable level of fragmentation?✔ Such a level can only be defined with respect to the
applications that run on the database.
� According to the value of application-specific parameters, individual fragments can be identified.
�
Chapter 3 - 14
2.4 Correctness Rules of Fragmentation
z Completeness✔ If a relation instance R is decomposed into fragments
R1, R2, . . ., Rn, each data item can be found in R can also be found in one or more of Rj’s.
z Reconstruction✔ If a relation instance R is decomposed into fragments
R1, R2, . . ., Rn, it should be possible to define a relational operator ∇ such that
R = ∇ Rj, ∀ Rj ∈ FR
z Disjointness✔ If a relation instance R is decomposed into fragments
R1, R2, . . ., Rn, and data item dj is in Rj, it is not any other fragment Rk ( k ≠ j ).
Chapter 3 - 15
2.5 Allocation Alternatives
z Comparison of Replication Alternatives
Full Replication Partial Replication Partitioning
QueryProcessingDirectoryManagementConcurrencyControl
Reliability
Reality
Easy
Easy ornonexistent
Moderate
Very high
Possible application
Difficult
Difficult
Difficult
High
Realistic
Difficult
Difficult
Easy
Low
Possible application
Chapter 3 - 16
2.6 Information Requirements
z Database Information
z Application Information
z Communication Network Information
z Computer System Information
Chapter 3 - 17
3. Fragmentation
z Design of Horizontal Fragmentation
z Design of Vertical Fragmentation
z Design of Hybrid Fragmentation
�
Chapter 3 - 18
3.1 Horizontal Fragmentationz Primary horizontal fragmentation
z Derived horizontal fragmentation
z Information requirements of horizontal fragmentation✔ Database Information✔ Application Information
Chapter 3 - 19
Database Information
z Concerns the global conceptual schema✔ How the DB relations are connected to one another,
especially with joins?✔ Expression of relationships among relations using links
z Example
7,7/(� 6$/
6
(12� (1$0(� 7,7/(
(
-12� -1$0(� %8'*(7� /2&
-
(12� -12� 5(63� '85
*
-RLQ UHODWLRQVKLSÆ &r zV
�þ� 6ö 6�ò 6�R zV�³¦"
��
Chapter 3 - 20
Application Informationz Predicates used in user queries
✔ The most active 20% of user queries account for 80% of the total data access.
z Simple Predicate✔ pj : Aj θ Value, θ ∈ {=, <, ≠, ≤, >, ≥}✔ Pri : set of all simple predicates defined on relation Ri
z Minterm Predicates✔ mi : the conjunction of simple predicates✔ Mi : the set of minterm predicates for relation Ri
Mi = {mij | mij = ∧ pik*}, 1 ≤ k ≤ m, 1 ≤ j ≤ z
where pik ∈ Pri and (pik* = pik or ¬pik)
Chapter 3 - 21
Example: Consider Relation ‘S’
S�� 7,7/( ³(OHFW�(QJ´
S�� 7,7/( ³6\VW�$QDO´
S�� 7,7/( ³0HFK�(QJ´
S�� 7,7/( ³3URJUDPPHU´
S�� 6$/ ≤ �����
S�� 6$/ > �����
P�� 7,7/( ³(OHFW�(QJ´ ∧ 6$/ ≤ �����
P�� 7,7/( ³(OHFW�(QJ´ ∧ 6$/ > �����
P�� ¬ �7,7/( ³(OHFW�(QJ´� ∧ 6$/ ≤ �����
P�� ¬ �7,7/( ³(OHFW�(QJ´� ∧ 6$/ > �����
P�� 7,7/( ³3URJUDPPHU´ ∧ 6$/ ≤ �����
P�� 7,7/( ³3URJUDPPHU´ ∧ 6$/ > �����
��
Chapter 3 - 22
Application Information (��)
z Quantitative Information✔ Minterm selectivity: sel(mi)
– Number of tuples of relations that would be accessed by a user query specified according to a given minterm
✔ Access frequency: acc(qi)– Frequency with which user application access data
Chapter 3 - 23
Primary Horizontal Fragmentation
z Definition✔ A fragmentation generated by a selection operation on
the owner relation of a database schema✔ Given relation Ri, its horizontal fragments are
Rij = σFj(Ri), 1 ≤ j ≤ w, Fj : the selection formula (mij)
z Example : Sample Relation ��� � σ/2& ³0RQWUHDO´ ����� � σ/2& ³1HZ <RUN´ ����� � σ/2& ³3DULV´ ���
��
Chapter 3 - 24
-12
-�
-1$0(
,QVWUXPHQWDWLRQ
%8'*(7
������
/2&
0RQWUHDO
-�
-12
-�
-1$0(
'DWDEDVH 'HYHORS�
%8'*(7
������
/2&
1HZ <RUN
-�
-� &$'�&$0 ������ 1HZ <RUN
-12
-�
-1$0(
0DLQWHQDQFH
%8'*(7
������
/2&
3DULV
-�
Example: Minterm Fragments
Chapter 3 - 25
Simple Predicate� ����
z Completeness✔ Pr� simple predicate������ fragment����
fragment������������������
���, Pr is complete!✔ Example
± 3U ^ /2& ³0RQWUHDO´� /2& ³1HZ <RUN´� /2& ³3DULV´ `
± 3U¶ 3U ∪ ^ %8'*(7 ≤ ������� %8'*(7 > ������ `
z Minimality✔ Pr��� fragment F� F1� F2�����, F1� F2�
������������
✔ Example± 3U¶¶ 3U¶ ∪ ^ -1$0( ³,QVWUXPHQWDWLRQ´ `
��
Chapter 3 - 26
$OJRULWKP &20B0,1LQSXW � 5 � UHODWLRQ� 3U � VHW RI VLPSOH SUHGLFDWHV
RXWSXW � 3U¶ � VHW RI VLPSOH SUHGLFDWHV
GHFODUH � ) � VHW RI PLQWHUP IUDJPHQWV
EHJLQ
ILQG D SL ∈ 3U VXFK WKDW SL SDUWLWLRQV 5 DFFRUGLQJ WR 3B5XOH
3U¶ SL
3U 3U�SL
) IL ^ IL LV WKH PLQWHUP IUDJPHQW DFFRUGLQJ WR SL `
GR
EHJLQ
ILQG D SM ∈ 3U VXFK WKDW SM SDUWLWLRQV VRPH IN RI 3U¶
DFFRUGLQJ WR 3B5XOH
3U¶ 3U¶ ∪ SM
3U 3U ± SM
) ) ∪ IMHQG�EHJLQ
XQWLO 3U¶ LV FRPSOHWH
HQG� ^&20B0,1`
P_Rule : fundamental rule of completeness and minimality, which statesthat a fragment is partitioned “into at least two parts which are accesseddifferently by at least one application.”
Chapter 3 - 27
$OJRULWKP 3+25,=217$/LQSXW � 5L � UHODWLRQ� 3UL � VHW RI VLPSOH SUHGLFDWHV
RXWSXW � 0L � VHW RI PLQWHUP IUDJPHQWV
EHJLQ
3U¶ &20B0,1�5L� 3UL�
GHWHUPLQH WKH VHW 0L RI PLQWHUP SUHGLFDWHV
GHWHUPLQH WKH VHW ,L RI LPSOLFDWLRQV DPRQJ SL ∈ 3UL¶
IRU HDFK PL ∈ 0L GR
LI PL LV FRQWUDGLFWRU\ DFFRUGLQJ WR , WKHQ
0L 0L ± PL
HQG�LI
HQG�IRU
HQG� ^3+25,=217$/`
Example:S�� DWW YDOXH� S
�� DWW YDOXH�
, L�� �DWW YDOXH�� ⇒ ¤�DWW YDOXH��
L�� �DWW YDOXH�� ⇒ ¤�DWW YDOXH��
0 P�� �DWW YDOXH�� ∧ �DWW YDOXH�� contradictory by I
P�� �DWW YDOXH�� ∧ ¤�DWW YDOXH��
P�� ¤�DWW YDOXH�� ∧ �DWW YDOXH��
P�� ¤�DWW YDOXH�� ∧ ¤�DWW YDOXH�� contradictory by I
��
Chapter 3 - 28
Examplez �, �: subject of primary horizontal fragmentation
z Assumption for �✔ There is only 1 application that accesses �.✔ That application checks the salary information.✔ Queries for � are issued at two sites.
z Simple PredicatesS�� 6$/ ≤ �����
S�� 6$/ > �����
⇒ 3U ^S
�� S
�` : complete and minimal by COM_MIN
Chapter 3 - 29
Example (Cont’d)
z Minterm PredicatesP
�� �6$/ ≤ ������ ∧ �6$/ > ������
P�� �6$/ ≤ ������ ∧ ¬ �6$/ > ������
P�� ¬ �6$/ ≤ ������ ∧ �6$/ > ������
P�� ¬ �6$/ ≤ ������ ∧ ¬ �6$/ > ������
⇒ m1 and m4 is contradictory : � � ���� ���
Therefore, we define two fragments, Fs = {S1, S2} according to M.
7,7/(
0HFK� (QJ�
3URJUDPPHU
6$/
�����
�����
6�
7,7/(
(OHFW� (QJ
6\VW� $QDO
6$/
�����
�����
6�
��
Chapter 3 - 30
Example (Cont’d)z Assumption for �
✔ There are 2 applications that access �.✔ The first is issued at three sites and finds the names and
budgets of projects given their number.✔ The second is issued at two sites and has to do with the
management of the projects.
z Simple PredicatesS�� /2& ³0RQWUHDO´
S�� /2& ³1HZ <RUN´
S�� /2& ³3DULV´
S�� %8'*(7 ≤ ������
S�� %8'*(7 > ������
⇒ 3U¶ ^S�� S�� S�� S�� S�` is complete and minimal : COM_MIN
Chapter 3 - 31
Example (Cont’d)
z Minterm PredicatesP� � /2& ³0RQWUHDO´ ∧ �%8'*(7 ≤ ������P� � /2& ³0RQWUHDO´ ∧ �%8'*(7 > ������P� � /2& ³1HZ <RUN´ ∧ �%8'*(7 ≤ ������P� � /2& ³1HZ <RUN´ ∧ �%8'*(7 > ������P� � /2& ³3DULV´ ∧ �%8'*(7 ≤ ������P� � /2& ³3DULV´ ∧ �%8'*(7 > ������
Therefore, we define six fragments,�- � ���� ��� ��� ��� ��� ��� according to M.
��
Chapter 3 - 32
Derived Horizontal Fragmentation
z ��
✔ Defined on a member relation of a link according to a selection operation on its owner.
✔ Given a link L, owner(L) = S & member(L) = R– Ri = R semi_join Si, 1 ≤ i ≤ w– w : # of fragments that will be defined on R– Si = a primary horizontal fragment for S
z ExampleL1 : owner(L1) = S and member(L1) = EE1 : E semi_join S1, where S1 = σSAL ≤ 30000(S)E2 : E semi_join S2, where S2 = σSAL > 30000(S)
Chapter 3 - 33
Potential Complicationz ���
✔ When there are more than two links into a relation, there is more than one possible horizontal fragmentation of the relation.
z Two criteria✔ Fragmentation used in more applications✔ Fragmentation with better join characteristics
z Recall the advantages of the fragmentation✔ Performing a query on smaller relations✔ Performing joins in a distributed fashion
z Simple Graph✔ A graph with only one link coming in or going out of a fragment.✔ Effects of storage and join performance!
��
Chapter 3 - 34
Example : Fragmentation of �z Assumption
✔ There are two applications which access *.✔ One finds the names of engineers who work at certain places.✔ The other accesses the project that employees work on and how
long they will work on those projects.
z The first fragmentation according to ��, ��, ��✔ *� * VHPLBMRLQ -�� ZKHUH -� � σ/2& ³0RQWUHDO´�-�✔ *� * VHPLBMRLQ -�� ZKHUH -� � σ/2& ³1HZ <RUN´�-�✔ *� * VHPLBMRLQ -�� ZKHUH -� � σ/2& ³3DULV´�-�
z The second fragmentation according to ��, ��✔ *� * VHPLBMRLQ (�✔ *� * VHPLBMRLQ (�
The final choice of the fragmentation scheme may be a decision problem addressed during allocation.
Chapter 3 - 35
Checking for Correctnessz Completeness
✔ PHF: A set of complete and minimal predicates, ���✔ DHF: Ensures referential integrity
z Reconstruction✔ for a relation R with fragments �5 � ���� ��� �� �:�
� � ∪ �L� ∀ �L ∈ �5
z Disjointness✔ PHF: Minterm predicates determining the
fragmentation are mutually exclusive✔ DHF: Disjointness can be guaranteed if the join graph
is simple; otherwise investigate actual tuple values
�
Chapter 3 - 36
3.2 Vertical Fragmentation
z DefinitionPartitions R to fragments R1, R2, …, Rr, each which contains a subset of R’s attribute as well as the primary key of R
z Inherently more complicated than horizontal partitioning✔ Total number of alternatives✔ Obtaining optimal solution is very difficult✔ Resort to heuristics
Chapter 3 - 37
Two Types of Heuristics
z Grouping✔ Starts by assigning each attribute to one fragment✔ Joins some of fragments until some criteria is satisfied.
z Splitting✔ Starts with a relation and decides on beneficial partitioning✔ Top-down design methodology���
z Note– Replication of the key in the fragments– Therefore, splitting is considered only for those attributes that do
not participate in the primary key.
�
Chapter 3 - 38
Information Requirements of Vertical Fragmentationz What needs to be determined about applications?
✔ Affinity of attributes: How closely related the attributes are?✔ Attribute usage value: use(qi, Aj) = 1 or 0
z ExampleT� � 6(/(&7 %8'*(7 )520 - :+(5( -12 9DOXH�
T� � 6(/(&7 -1$0(� %8'*(7 )520 -�
T� � 6(/(&7 -1$0( )520 - :+(5( /2& 9DOXH�
T� � 6(/(&7 680�%8'*(7� )520 - :+(5( /2& 9DOXH�
$� $� $� $�T� � � � �
T� � � � �
T� � � � �
T� � � � �
$IILQLW\ 0DWUL[$� -12
$� -1$0(
$� %8'*(7
$� /2&
Chapter 3 - 39
Attribute Affinityz Attribute Affinity
refl(qk) : # of accesses to attributes (Ai, Aj) for each execution of application qk at site Sl
accl(qk) : the application access frequency measure
∑ ∑=∧= ∀
=1),( 1),(|
ji )()()A ,aff(Ajkik lAquseAqusek S
klkl qaccqref
��
Chapter 3 - 40
Example– Assume that refl(qk) = 1 for all qk and Sl
– Application frequenciesacc1(q1) = 15 acc2(q1) = 20 acc3(q1) = 10acc1(q2) = 5 acc2(q2) = 0 acc3(q2) = 0acc1(q3) = 25 acc2(q3) = 25 acc3(q3) = 25acc1(q4) = 3 acc2(q4) = 0 acc3(q4) = 0
∑ ∑= =
=++==1
1
3
113121131 45)(qacc)(qacc)(qacc )()A ,aff(A
k lkl qacc
attribute affinity matrix (AA)
A1 A2 A3 A4
A1 45 0 45 0
A2 0 80 5 75
A3 45 5 53 3
A4 0 75 3 78
Chapter 3 - 41
Clustering Algorithmz ����
✔ Find some means of grouping the attributes of a relation based on the attribute affinity values in AA.
✔ Net contribution to the global affinity measure of placing attribute Ak
between Ai and Aj ;± FRQW�$L� $N� $M� ERQG�$L� $N� � ERQG�$N� $M� ± ERQG�$L� $M�
z Examplecont(A1, A4, A2) = bond(A1, A4) + bond(A4, A2) – bond(A1, A2)bond(A1, A4) = 45 × 0 + 0 × 75 + 45 × 3 + 0 × 78 = 135bond(A4, A2) = 11865 bond(A1, A2) = 225
Therefore, cont(A1, A4, A2) = 135 + 11865 – 225 = 11775
∑=
=n
zyzxz AAaffAAaff
1yx ),(),( )A ,bond(A where
��
Chapter 3 - 42
$OJRULWKP &/867(5,1*
LQSXW � $$ � DWWULEXWH DIILQLW\ PDWUL[
RXWSXW � &$ � FOXVWHUHG DIILQLW\ PDWUL[
EHJLQ
^ LQLWLDOL]H� UHPHPEHU WKDW $$ LV DQ Q × Q PDWUL[ `
&$��� �� $$��� ��
&$��� �� $$��� ��
LQGH[ ��
ZKLOH LQGH[ ≤ Q GR ^FKRRVH WKH ³EHVW´ ORFDWLRQ IRU DWWULEXWH $$LQGH[ `EHJLQ
IRU L IURP � WR LQGH[ ± � E\ � GR
FDOFXODWH FRQW�$L ± �� $LQGH[� $L�
HQG�IRU
FDOFXODWH FRQW�$LQGH[ ± �� $LQGH[� $LQGH[ � �� ^ERXQGDU\ FRQG�`
ORF SODFHPHQW JLYHQ E\ PD[LPXP FRQW YDOXH
IRU M IURP LQGH[ WR ORF E\ ± � GR
&$��� M� &$��� M ± ��
HQG�IRU
&$��� ORF� $$��� LQGH[�
LQGH[ LQGH[ � �
HQG�ZKLOH
RUGHU WKH URZV DFFRUGLQJ WR WKH UHODWLYH RUGHULQJ RI FROXPQV
HQG� ^&/867(5,1*`
Chapter 3 - 43
ExampleOrdering(0-3-1) :
cont(A0, A3, A1) = bond(A0, A3) + bond(A3, A1) – bond(A0, A1)bond(A0, A1) = bond(A0, A3) = 0bond(A3, A1) = 45 × 45 + 5 × 0 + 53 × 45 + 3 × 0 = 4410
cont(A0, A3, A1) = 4410
Ordering(1-3-2) :cont(A1, A3, A2) = bond(A1, A3) + bond(A3, A2) – bond(A1, A2)
bond(A1, A3) = bond(A3, A1) = 4410bond(A3, A2) = 890, bond(A1, A2) = 225
cont(A1, A3, A2) = 5525
��
Chapter 3 - 44
Ordering(2-3-4) :cont(A2, A3, A4) = bond(A2, A3) + bond(A3, A4) – bond(A2, A4)
bond(A2, A3) = 890bond(A3, A4) = bond(A2, A4) = 0
cont(A1, A3, A2) = 890
And so forth … : The resulting Clustered Affinity Matrix (CA)
A1 A3 A2 A4
A1 45 45 0 0
A3 45 53 5 3
A2 0 5 80 75
A4 0 3 75 78
Chapter 3 - 45
Partitioning Algorithm
z The upper left-hand corner of CA : TA
z The lower right-hand corner of CA : BA
AQ(qi) = { Aj | use(qi, Aj) = 1 }TQ = { qi | AQ(qi) ⊆ TA }BQ = { qi | AQ(qi) ⊆ BA }OQ = Q – { TQ ∪ BQ }
��
Chapter 3 - 46
∑ ∑∈ ∀
=Qq S
ijij
i j
qaccqrefCQ )()(
∑ ∑∈ ∀
=TQq S
ijij
i j
qaccqrefCTQ )()(
∑ ∑∈ ∀
=BQq S
ijij
i j
qaccqrefCBQ )()(
∑ ∑∈ ∀
=OQq S
ijij
i j
qaccqrefCOQ )()(
To find the point x such that z is maximized :z = CTQ × CBQ – COQ2
Chapter 3 - 47
$OJRULWKP 3$57,7,21
LQSXW � &$ � FOXVWHUHG DIILQLW\ PDWUL[ � 5 � UHODWLRQ
RXWSXW � ) � VHW RI IUDJPHQWV
EHJLQ
^ GHWHUPLQH WKH ] YDOXH IRU WKH ILUVW FROXPQ `
^ WKH VXEVFULSWV LQ WKH FRVW HTXDWLRQV LQGLFDWH WKH VSOLW SRLQW `
FDOFXODWH &74Q ± � � &%4Q ± � � &24Q ± �
EHVW &74Q ± � × &%4Q ± � ± �&24Q ± ���
GR
IRU L IURP Q ± � WR � E\ ± � GR
FDOFXODWH &74L � &%4L � &24L
] &74L × &%4L ± &24L�
LI �] > EHVW� WKHQDVVLJQ EHVW WR ] DQG UHFRUG WKH VKLIW SRVLWLRQ
HQG�LI
HQG�IRU
FDOO 6+,)7�&$�
XQWLO QR PRUH 6+,)7 LV SRVVLEOH
UHFRQVWUXFW WKH PDWUL[ DFFRUGLQJ WR WKH VKLIW SRVLWLRQ
5� Π7$�5� ∪ . ^ . LV WKH VHW RI SULPDU\ NH\ DWWULEXWHV RI 5 `
5� Π%$�5� ∪ .
) ^5�� 5�`
HQG� ^3$57,7,21`
��
Chapter 3 - 48
Checking for Correctness
z CompletenessA = TA ∪ BA
z ReconstructionR = JOINK Ri, ∀ Ri ∈ FR
z Disjointnessnot important as horizontal fragmentation due to the replication of primary key
Chapter 3 - 49
3.3 Hybrid Fragmentation
z The levels of nesting in most practical applications do not exceed 2.
R
H H
R2R1
V V V V V
R11 R12 R21 R22 R23
��
Chapter 3 - 50
Correctness of Hybrid Fragmentation
z Reconstruction✔ Starts at the leaves of the partitioning tree and moves
upward by performing joins and unions
� Completeness✔ Fragmentation is complete if the intermediate and leaf
fragments are complete.
� Disjointness✔ Fragmentation is disjoint if the intermediate and leaf
fragments are disjoint.
Chapter 3 - 51
4. Allocationz Definition
The allocation problem involves finding the optimaldistribution of relations (fragments) to sites.
z Measures of optimality✔ Minimal cost :
– cost of storing, querying, updating, and data communication
✔ Performance :– to minimize the response time and– to maximize the system throughput at each site
��
Chapter 3 - 52
4.1 Some example of data placementand allocation
(Example 1) Single-relation case
SDJH �
��� S�
��� T�
��� U�
��� V�
��� T�
��� U�
��� V�
��� S�
��� U�
��� V�
��� S�
��� T�
��� V�
��� S�
��� T�
��� U�
SDJH � SDJH � SDJH � SDJH �
��� S�
��� T�
��� S�
��� T�
��� U�
��� V�
��� U�
��� V�
��� S�
��� T�
��� S�
��� T�
��� U�
��� V�
��� U�
��� V�
SDJH � SDJH � SDJH �
Table 1 : Table 2 :
Query : { (1, *)?, (2, *)?, …, (*, p)?, (*, q)?, … }
z Distributed placement– site 1 : page(1, 4), site 2 : pages(2, 3)– site 1 : page(1, 2), site 2 : pages(3, 4)– site 1 : page(1, 3), site 2 : pages(2, 4)
Chapter 3 - 53
(Example 2) Multiple-relation case
5�
�D� S�
�E� T�
�F� U�
�G� V�
�S� ��
�T� ��
�U� ��
�V� ��
�D� T�
�E� U�
�F� V�
�G� S�
�S� ��
�T� ��
�U� ��
�V� ��
�D� U�
�E� V�
�F� S�
�G� T�
�S� ��
�T� ��
�U� ��
�V� ��
�D� V�
�E� S�
�F� T�
�G� U�
�S� ��
�T� ��
�U� ��
�V� ��
SDJH � SDJH � SDJH � SDJH �
5� 5� 5� 5� 5� 5� 5�
5�
�D� S�
�E� S�
�D� T�
�E� T�
�S� ��
�S� ��
�T� ��
�T� ��
�F� S�
�G� S�
�F� T�
�G� T�
�S� ��
�S� ��
�T� ��
�T� ��
�D� U�
�E� U�
�D� V�
�E� V�
�U� ��
�U� ��
�V� ��
�V� ��
�F� U�
�G� U�
�F� V�
�G� V�
�U� ��
�U� ��
�V� ��
�V� ��
SDJH � SDJH � SDJH � SDJH �
5� 5� 5� 5� 5� 5� 5�
σcol1 = ‘a’(R1) JOINcol2 = col1 σcol2 = 1(R2)
z Distributed placement– site 1 : page(1, 2), site 2 : pages(3, 4)– site 1 : page(1, 3), site 2 : pages(2, 4)
��
Chapter 3 - 54
4.2 A practical combinatorial optimizationapproach to the file allocation problem
z Assumption
✔ Most files are not fragmented.✔ It is unlikely that we will try to exploit parallelism in our file
allocation.✔ Each computing facilities have tight limits on their local mass
storage capacity.✔ Storage is considered to be a constant on the optimization, rather
than as a cost.✔ The transaction traffic is known in advance.✔ Reads and updates have equal costs.✔ Remote accesses all have the same unit cost.✔ No redundancy is permitted and fragmentation of file is forbidden.
Chapter 3 - 55
Notation and Constraintsz N nodes, indexed by j, capacity = cj
z M files, indexed by i, size = si
z T transactions, indexed by k, frequency from node j = fkj
z nki accesses from transaction k to file i
z xij : decision variable 1 – file i is allocated to node j, 0 – otherwise
The goal of FAP = maximize(Σi,j Xij Vij)where Vij = Σk fkj (nki) × cost of local retrievals
∑ ≤≤∀=j
ij Miix 1|,1
∑ ≤≤∀≤i
jiij Njjcsx 1|,
�
Chapter 3 - 56
Algorithm FAP
1. Calculate J(i) = { j’ | Vij = max Vij}, 1 ≤ j ≤ N
2. An optimal set of xkj is given by xij = 1 for some j ∈ J(i) and xij = 0 otherwise.
3. If this solution is feasible (i.e. meets the constraints), it is our answer; go to step 7
4. Otherwise, identify all nodes which cause the constraints to be broken.
5. For every such over-subscribed node, solve the corresponding knapsack problem, thereby eliminating a node and the files allocated to that node from further consideration.
6. Consider J(i) for any nodes j which remain. If there are such nodes go to step 2
7. Otherwise, we have finished.
Chapter 3 - 57
� ����
�� ��
��
�
��
�
Access rates of transactions to files (nki)
��
��
���
��
��
��
��
��
��
� ��� � ���� � ��� � ��� � ��� � ��� � ���
Allocate 8 files among five sites, each with 20 MB disk.
��
�
�
�
��
�
�
�
�
�
�
�
�
�
�� ��
� �
�
)LOH � VL]H�0E\WHV �7UDQVDFWLRQV
�
�
�
�
�
�
�
�
�
��
Example
�
Chapter 3 - 58
�
��
��
�
��
�
��
�
�
The frequency of transactions in sites (fkj)
�
�
� � � �
���
���
��
��
�
��
��
���
��
�
�
��
��
�
�
��
��
�
�
� ��
6LWHV7UDQVDFWLRQV
�
�
�
�
�
�
�
�
�
��
Chapter 3 - 59
� ����
����
���
����
���
����
���
���
���
����
���
����
���
���
��
���
���
����
���
���
���
��
�
���
���
�
�
���
��
���
���
���
���
� ��� � ���� � ��� � ��� � ��� � ��� � ���
��� ��� ��� ��� ��� ��� �� ���
)LOH � VL]H�0E\WHV �6LWH M
�
�
�
�
�
1. J(i) are the yellow elements for each i.
2. If we assign xij = 1 for these and 0 for the other entries we have our first solution.
3. Site 1 has been allocated 55Mbytes of files. This is not a feasible solution.
4. Site 1 has been allocated too much.
5. The maximum value(Vij) we can get from storing any files on site 1 is obtained
by storing files 1, 2, and 8 there.
6. Our new Vij table is obtained by eliminating row 1 above and column 1, 2, and 8.
The new J(i) are the underlined entries ( all allocated to site 3 )
Vij table
��
Chapter 3 - 60
2’. Assign xij = 1 to these, xij = 0 to the remainder of the entries.
3’. Site 3 has been allocated 47 Mbytes.
4’. Site 3 has been overloaded.
5’. The maximum value we can get from storing files on site 3 is obtained
by storing files 4 and 5 there.
6’. Our new Vij table is obtained by eliminating row 3 and column 4 and
5 from the reduced table.
� ����
���
���
���
�
���
���
�
��
��
New Vij
� ��� � ���
)LOH � VL]H�0E\WHV �6LWH
�
�
�
The new j(i) are underlined ( all allocated to site 4 )
Chapter 3 - 61
2’’. Assign 1 to xij for these entries. 0 for the rest.
3’’. Site 4 has been allocated 29 Mbytes.
4’’. Site 4 has been overloaded.
5’’. Store file 3 at site 4.
6’’. Our new Vij table is obtained by eliminating row 2, column 1 from the table above.
Without spelling out the details, it is clear that the remaining 2 files, 6 and 7, are allocated to site 5.
So our solution is
6LWH
�
)LOH7RWDO VSDFH XVHG
�0E\WHV�
�� �� � ��
� �
� �� � ��
� � ��
� �� � ��
��
Chapter 3 - 62
4.3 Database Allocation Problem
z DAP is different from FAP✔ The relationship between fragments should be taken into
account.✔ The relationship between the allocation and query
processing should be properly modeled.✔ FAP do not take into consideration the cost of integrity
enforcement.✔ The cost of enforcing concurrency control mechanisms
should be considered.
Chapter 3 - 63
z There are no general heuristic models that take as input a set of fragments and produce a near-optimal allocation subject to the types of constraints discussed here.
z We present a relatively general model and thendiscuss a number of possible heuristics that might be employed to solve it.
��
Chapter 3 - 64
Information Requirements
z Database information– the selectivity of a fragment Fj with respect to query qi : seli(Fj)– the size of a fragment Fj : size(Fj) = card(Fj) × length(Fj)
z Application information– # of read (write) accesses from qi to Fj : RRij (URij)– UM with uij (1 or 0), RM with rij (1 or 0), and O with o(i)– for each query, a maximum allowable response time is defined
z Site information– for each site, its storage and processing capacity is defined– unit cost of storing data at site Sk : USCk
– the cost of processing one unit of work at site Sk : LPCk
z Network information– the communication cost per frame between Si and Sj : gij
– the size (in bytes) of one frame : fsize
Chapter 3 - 65
Allocation Modelz Objectives
Minimize(Total Cost) subject to response-time/storage/processing constraint
z xij = 1 if Fi is stored at Sj, and xij = 0 otherwise
z Total Cost
STCjk : the cost of fragment Fj at site Sk
STCjk = USCk × size(Fj) × xjk
QPCi : query processing cost of application qi
QPCi = processing cost (PCi) + transmission cost(TCi)
∑ ∑∑∈∀ ∈∀∈∀
+=SS FF
jkQq
i
k ji
STCQPCTOC
��
Chapter 3 - 66
)(CCcost CC )(IEcost t enforcemenintegrity )(ACcost access PC iiii ++=
∑ ∑∈∀ ∈∀
×××+×=SS FF
kjkijijijiji
k j
LPCxRRrURuAC )(
iii TCR TCU TC +=
∑ ∑ ∑ ∑∈∀ ∈∀ ∈∀ ∈∀
××+××=SS FF SS FF
iokjkijkiojkiji
k j k j
gxugxuTCU )(,),(
∑∈∀
∈ ×××=FF
iokji
jkijSSi
j
kg
fsize
FselxrminTCR )
)(( )(,
Chapter 3 - 67
Solution Methods
z The formulation of DAP is NP-complete.
z Thus, one has to look for heuristic methods that yieldsuboptimal solutions.
z Heuristic methods– knapsack problem solution– branch-and-bound– network flow algorithm
⇒ There is not enough data to determine how close the
results are to the optimal.
Top Related