Query Optimisation in Distributed Object-Oriented Database Systems*

10
Query Optimisation in Distributed Object-Oriented Database Systems* W. SUNf, W. MENG AND C. YU Department of Electrical Engineering and Computer Science, University of Illinois at Chicago, Chicago, Illinois 60680, USA ^School of Computer Science, Florida International University, University Park, Miami, Florida 33199, USA In this paper, query processing and optimisation in distributed object-oriented database systems are discussed. The processing and optimisation of typical queries, called chain queries, in distributed object-oriented database systems are investigated in detail. An algorithm with complexity of O(n 3 * h,) to minimise the total cost is provided using dynamic programming, where n is the number of classes referenced in the query, h, ^ min (n + 2,h) and h is the number of sites in the network. A wide range of diversified issues are addressed and uniformly integrated into our basic solution to the problem. These issues include sorted states of classes; local processing of selections and projections, allowing multiple intermediate results; arbitrary target class at an arbitrary answer site; replicated data; class hierarchies (which captures the IS-A relationship among objects) ; different sites with different processing speeds, and communication lines between different sites with different transfer speeds. The uniformity of this algorithm under so many diversified situations strongly demonstrates the usefulness and the flexibility of the algorithm. Received November 1990, revised February 1991 1. INTRODUCTION In this section, background information is provided. First, the problem is presented. In Section 1.2, basic cost model (of class traversals) is introduced. The difference of the proposed approach from the previous researches is pointed out in Section 1.3. 1.1 The problem Object-oriented programming has been introduced into database systems recently. Several prototypes of object- oriented database systems (OODBs) have been and/or are being developed such as Iris, 828 GemStone, 1029 Postgres, 37 - 38 ORION, 7 - 2224 6> 2 , 616 Starburst 18 and Exodus. 11 Many research results indicate that object- oriented data bases can be applied in various large-scale and complicated application domains. 2 " 5192731 ' 344041 Objects are grouped into classes. 2325 Objects that belong to a class are also called instances of the class. Class hierarchies can be defined to capture the IS-A relationship among objects in different classes. A higher/lower level class is called a superclass/subclass. Attributes specified for a class are inherited (shared) by all its subclasses, and more specific attributes can be defined for subclasses. The domain of an attribute is a class. If the domain of attribute B is class D, then attribute B can take instance from class D or one of the subclasses of D as its value. It is assumed that each object is assigned a system-wide unique identifier (UID). A class can be a primitive class (such as string and integer classes) or a non-primitive one with a set of attributes defined. The value of an attribute is a domain value if its domain is a primitive class; otherwise it is a reference to (object identifier of) an instance of the domain class. As pointed out by Ullman, 39 unique identification of objects is one of the most important characteristics that differentiate an OODB system from a relational database * This research is supported in part by MCC and in part by NSF under IRI-8901789. system. This characteristics will be fully made use of by our approach. Refs 25 and 21 are a good survey paper and reference book, respectively, for object-oriented concepts and databases. In our previous research 38 we discussed query optimisation in a centralised OODB system. This paper will discuss query optimisation in distributed OODBs. Examples from ORION are used to illustrate our approach. 7 Since ORION, which has exploited many object-oriented characteristics such as class hierarchy, composition hierarchy and unique object identification, is a representative example of OODBs, we believe our result is applicable to many other OODBs. 2125 The following is an ORION database schema (class hierarchies are ignored), and an example query Q\ against this database schema: 7 Auto Manufacturer weight colour owner » -Person age hometown - -City name state— -•-State name population select Auto where owner . hometown . state . population > 10000 Figure 1. A sample database schema and a query. The query using notations from Ref. 46 is to find all automobiles owned by persons who live in cities within states with populations over 10000. Class Person is the domain of attribute owner of class Auto; the domain of attribute population in class State is a primitive class integer, and so on. Auto-ID, Person-ID, City-ID and State ID are unique object identifiers for classes Auto, Person, City and State, respectively. They are auto- matically generated for objects and maintained by OODBs. If we neglect the output and the selection of the query, and properly rename some attributes, its query graph becomes: 7 98 THE COMPUTER JOURNAL, VOL. 35, NO. 2, 1992 Downloaded from https://academic.oup.com/comjnl/article/35/2/98/360312 by guest on 06 February 2022

Transcript of Query Optimisation in Distributed Object-Oriented Database Systems*

Page 1: Query Optimisation in Distributed Object-Oriented Database Systems*

Query Optimisation in Distributed Object-Oriented DatabaseSystems*

W. SUNf, W. MENG AND C. YUDepartment of Electrical Engineering and Computer Science, University of Illinois at Chicago, Chicago, Illinois 60680, USA

^School of Computer Science, Florida International University, University Park, Miami, Florida 33199, USA

In this paper, query processing and optimisation in distributed object-oriented database systems are discussed. Theprocessing and optimisation of typical queries, called chain queries, in distributed object-oriented database systems areinvestigated in detail. An algorithm with complexity of O(n3 * h,) to minimise the total cost is provided using dynamicprogramming, where n is the number of classes referenced in the query, h, ^ min (n + 2,h) and h is the number of sitesin the network. A wide range of diversified issues are addressed and uniformly integrated into our basic solution to theproblem. These issues include sorted states of classes; local processing of selections and projections, allowing multipleintermediate results; arbitrary target class at an arbitrary answer site; replicated data; class hierarchies (whichcaptures the IS-A relationship among objects) ; different sites with different processing speeds, and communication linesbetween different sites with different transfer speeds. The uniformity of this algorithm under so many diversifiedsituations strongly demonstrates the usefulness and the flexibility of the algorithm.

Received November 1990, revised February 1991

1. INTRODUCTIONIn this section, background information is provided.First, the problem is presented. In Section 1.2, basic costmodel (of class traversals) is introduced. The differenceof the proposed approach from the previous researches ispointed out in Section 1.3.

1.1 The problem

Object-oriented programming has been introduced intodatabase systems recently. Several prototypes of object-oriented database systems (OODBs) have been and/orare being developed such as Iris,828 GemStone,1029

Postgres,37-38 ORION,7-2224 6>2,616 Starburst18 and

Exodus.11 Many research results indicate that object-oriented data bases can be applied in various large-scaleand complicated application domains.2"5192731'344041

Objects are grouped into classes.2325 Objects thatbelong to a class are also called instances of the class.Class hierarchies can be defined to capture the IS-Arelationship among objects in different classes. Ahigher/lower level class is called a superclass/subclass.Attributes specified for a class are inherited (shared) byall its subclasses, and more specific attributes can bedefined for subclasses. The domain of an attribute is aclass. If the domain of attribute B is class D, thenattribute B can take instance from class D or one of thesubclasses of D as its value. It is assumed that each objectis assigned a system-wide unique identifier (UID). Aclass can be a primitive class (such as string and integerclasses) or a non-primitive one with a set of attributesdefined. The value of an attribute is a domain value if itsdomain is a primitive class; otherwise it is a reference to(object identifier of) an instance of the domain class. Aspointed out by Ullman,39 unique identification of objectsis one of the most important characteristics thatdifferentiate an OODB system from a relational database

* This research is supported in part by MCC and in part by NSFunder IRI-8901789.

system. This characteristics will be fully made use of byour approach. Refs 25 and 21 are a good survey paperand reference book, respectively, for object-orientedconcepts and databases.

In our previous research38 we discussed queryoptimisation in a centralised OODB system. This paperwill discuss query optimisation in distributed OODBs.Examples from ORION are used to illustrate ourapproach.7 Since ORION, which has exploited manyobject-oriented characteristics such as class hierarchy,composition hierarchy and unique object identification,is a representative example of OODBs, we believe ourresult is applicable to many other OODBs.2125 Thefollowing is an ORION database schema (classhierarchies are ignored), and an example query Q\against this database schema:7

AutoManufacturerweightcolourowner »

-Personagehometown -

-Citynamestate—

-•-Statenamepopulation

select Autowhere owner . hometown . state . population > 10000

Figure 1. A sample database schema and a query.

The query using notations from Ref. 46 is to find allautomobiles owned by persons who live in cities withinstates with populations over 10000. Class Person is thedomain of attribute owner of class Auto; the domain ofattribute population in class State is a primitive classinteger, and so on. Auto-ID, Person-ID, City-ID andState ID are unique object identifiers for classes Auto,Person, City and State, respectively. They are auto-matically generated for objects and maintained byOODBs. If we neglect the output and the selection of thequery, and properly rename some attributes, its querygraph becomes:7

98 THE COMPUTER JOURNAL, VOL. 35, NO. 2, 1992

Dow

nloaded from https://academ

ic.oup.com/com

jnl/article/35/2/98/360312 by guest on 06 February 2022

Page 2: Query Optimisation in Distributed Object-Oriented Database Systems*

QUERY OPTIMISATION IN OBJECT-ORIENTED DATABASE SYSTEMS

AutoPerson-ID-

Person-»- Person-ID City-ID -

City-City-ID State-ID -

State•-State-ID

Figure 2. Query graph of a sample query.

Attributes owner, hometown and state are complexattributes which take UIDs of class Person (Person-ID),class City (City-ID) and class State (State-ID) as theirvalues, respectively. The attributes Person-ID, City-IDand State-ID may be invisible to the users. If class Y isthe domain class for attribute X, we draw an arrow fromX to Y. This is an example of a typical query called achain query. In a chain query there is a common'chaining' attribute between adjacent classes, but there isno common 'chaining' attribute between non-adjacentclasses. We use C, to denote an original class, and CM torepresent the simplifier class (of C4) containing only theID attributes which are chained to its adjacent classes asshown in Figure 3. Thus the general form of a chainquery is:

l . i

/ D ,C2.2

.ID2,ID3-Cn.n

-ID,

Figure 3. Query graph of generalised chain query.

where IDt is the attribute that uniquely identifies objectsin class C(, 1 < i < n. In OODBs, the chaining from oneclass to another comes about naturally through the useof complex attributes as shown above. In this paperwe shall restrict ourselves to mostly chain-queryoptimisation. It can be observed that chaining of classesinvolved in a query graph corresponds to joining inrelational database (referred to as a functional join.11'46

In ORION it is called 'class traversals'. We will use bothtraversals and (functional) joins indiscriminately). Ifclass Ci{ is considered as a relation, then attribute IDt ofC ( , becomes the 'join' attribute between C,, and C,^ (_u

and attribute IDt+1 of CM becomes the 'join' attributebetween C,, and C(+1 (+1, 2 ^ / ^ n — 1 . Let the above'join' be denoted by ®. Further, let

Cu = C1 ® C2

Ctn = C( ® Ci+

, [ t , i + l ]Ct[IDt+1],i^n-

® Cn[IDt], i:? 2

where the two attributes within the brackets are theattributes of the intermediate results, Ctpi+j. Cl( andC( „ are special intermediate results in that they onlyhave a single attribute, since we only need to keep thejoin attributes needed in subsequent join operations.(Attribute IDX in C M and attribute IDn in C( „ will neverbe used for subsequent joins, therefore they are aban-doned.) Furthermore, whenever a join has beenperformed the attribute participating, in the join will beabandoned, so that the intermediate results Cu onlyconsist of UID pairs [/£>„ ID1+1]. Obviously, Cln is theanswer (if we ignore the target class), which can beobtained from Cx k[IDk+l] ® Ck+l n[IDk+l] for somek, l^k<n. This'implies C1 „ = C t ® C 2 ® ... ® Cn

[IDk+l]. We are interested in obtaining CltB with the leastcost using dynamic programming.

It should be noted that, in addition to functional joins,OODB queries may also be processed by usingnavigational methods.39 However, in a distributed en-vironment, a navigational method is likely to incur very

high communication costs. Therefore, navigationalmethods will not be considered in this paper.

1.2 Basic class traversal methods and class hierarchies

In this section, we discuss basic class traversal methods.We first ignore class hierarchies. It is assumed in ORIONthat each class is a physical file and objects of classes arestored in ascending order of their IDs.7-2224 There arethree basic traversal methods: Forward Nested LoopTraversal (FNL), Reversed Nested Loop Traversal(RNL) and Sort Merge Traversal (SM).24 They havedifferent cost formulas.

FNL (Forward Nested Loop Traversal), namely,

CUk[ID(,IDk+1]FNL

= C f i t [ / £ > f , / O , + 1 ] ® C j + h k [ I D j + 1 , I D k + 1 ] , l ^ i < k ^ n .

Cit is the outer class (corresponding to outer relation innested loop join) and Cj+1 k is the inner class. For eachobject t of C, p a value tIDl+1 is obtained. Then Cj+1 k issearched for tIDj+1. If tIDj+l = sIDj+1 for some object s inCi+lk (note that at most one such s can be found becauseof the uniqueness of the class objects), an object with thevalue pair (tIDt, sIDk+l) is constructed for Cik whoseschema is [IDt, /Dt+1] (the join attribute IDj+1 isabandoned). This process is repeated for all objects ofCu. Clearly, if C,} is sorted on its first attribute IDt, thenthe intermediate result Cik is also sorted on its firstattribute IDt. Thus we have

where ' -» ' denotes 'produces', C'u denotes C(J sorted inascending order of IDt, and Cf+lk denotes C]+lk in anystate (sorted or not). Furthermore, we also have

that is, the produced intermediate result Ct k is not sortedif its first operant C,} is not sorted. Let FNL also be thecost of applying FNL. Then, FNL (C*p C*+Uk) = Fx(\Cti\,ICf+i.tl), where Fx is a cost-estimation function (whichcan be derived using standard techniques).44 F1 isindependent of whether these two arguments are sorted ornot. \X\ is the number of objects in class X.

RNL (Reversed Nested Loop Traversal), namely,

CtJ is the inner class and C)+1 k is the outer one. Theprocessing is for each object of CJ+lk to match againstthe objects of Cu (say, by index on attribute IDJ+l ofclass C(i). Since objects of C( } are not uniquely identifiedby ID]+l, accessing an object in C(} with IDj+1 is morecostly than that in Cj+1 k when applying FNL. Also theresulting objects of C, k are not sorted on IDt, irrespectiveof whether Ctj and/or Cj+1 k are sorted or not. Thus,

C*

The cost to apply RNL is RNL (Cf}, C*+l k) = F2(\Ctj\,IQ+i,*l)> where F2 is independent of whether these two

THE COMPUTER JOURNAL, VOL. 35, NO. 2, 1992 997-2

Dow

nloaded from https://academ

ic.oup.com/com

jnl/article/35/2/98/360312 by guest on 06 February 2022

Page 3: Query Optimisation in Distributed Object-Oriented Database Systems*

W. SUN, W. MENG AND C. YU

arguments are sorted or not. RNL is likely to be betterthan FNL if the number of objects in C1+1 k is muchsmaller than the number of objects in Ciy

SM (Sort Merge), namely,SM

CUk[ID{,Wk+1] = Cu[IDt,ID,+1\ ® Cj+1,k[IDj+1,IDk+1].

This is the standard sort-merge method. If Ci+l k is notsorted on IDj+1, sort it on IDj+l (note that C?+1 k could besorted on IDj+l without explicitly performing a sortingoperation; this happens when Cj+l k is an original classor an intermediate result obtained by using the FNL joinmethod while the first argument of FNL is sorted on itsfirst attribute). Sort CtJ on IDj+l. (This is necessary, sinceno matter whether C,} is an original or an intermediateclass, C(J is not sorted on its second attribute.) Thenmerge these two sorted classes. Therefore, we have:

who live in cities within states with a population over10000. That is, if A is a class, then A* is used to representthe class hierarchy rooted at A. In this example, the classhierarchy rooted at Auto, rather than the class Auto only,needs to be traversed. When class hierarchies are involvedin FNL, RNL and SM, cost-estimation formulas need tobe modified by taking the following into consideration:ordering of objects in a hierarchy, class hierarchy index(index is established on the class hierarchy instead of ona single class.22 Details about how these factors can affectthe above cost-estimation formulas can be found in ourprevious research result.38 Clearly whether classhierarchies are involved in a query or not, only the basiccost-estimation formulas for FNL, RNL and SM can beaffected. Thus, after these cost-estimation formulas areobtained, we no longer need to distinguish classes withclass hierarchies, therefore, in later discussions we assumeclasses are simple classes without losing generality.

so that, in any case, the obtained intermediate result Cf kwill not be sorted, irrespective of whether Ct t and/orC3+1 k is sorted or not. The cost to apply the SM methodis:

SM(C*p Cj+1 k) = Sort(C( ]) + Sort(Ci+1 k)+ Merge (C^C,^)

SM(C*t, Csj+Uk) = Sort (CtJ) + Merge (Cu, Cj+Uk),

where Sort (X) is the cost to sort class X and Merge (X,Y) is the cost to merge two sorted classes X and Y.Deriving the estimating costs for Sort and Merge isstraightforward and standard. For the SM method, if thesecond class is sorted the cost will be smaller becausesorting on the second class is not needed.

For each of the three methods, whether the first classis sorted on IDt or not makes no difference in costestimation. However, if the first class is sorted on /£>„FNL will produce an intermediate result in sorted formwhich may reduce the cost for some subsequent SM joinoperations; the other two methods produce unsortedintermediate results. Note that none of these methodscan produce intermediate results sorted on their secondattributes.

Some OODBs such as O2lb support clustering of

objects in their physical storage. When clustering is usedand original classes are involved in the join, the abovecost estimations for FNL, RNL and SM may need to bemodified. Also, objects of classes may no longer bestored in ascending order of their IDs initially, and thealgorithm to be presented in this paper subsumes thissituation, i.e. the proposed algorithm is applicableregardless of whether the original classes are sorted orunsorted. Since it can be observed from the aboveformalism that detailed cost estimations for FNL, RNLand SM have no significant effect in subsequentdiscussions, clustering will not be discussed any further.

Now we proceed to incorporate class hierarchies intothe above formulation. We note that operands involvedin the above FNL, RNL and SM are either originalclasses or intermediate results. However, these operandsin class traversals can actually involve class hierarchies.For example, if classes Vehicle and Bike are subclasses ofAuto, and the class Auto in query Q\ (see Figure 1) isreplaced by Auto*,722 then the query is to find allautomobile owners (including vehicle and bike owners)

1.3 Literature review

Dynamic programming techniques have been employedby various workers in relational query optimisation.1214'26,32,33,43,45 H o w e v e r ; there are significant differencesbetween the solution we propose and earlier ones inaddition to the difference in the underlying data models.

(1) The cost model in our formalism is much morerealistic than those given by other researchers.

Local reductions (selections/projections). Localreductions are either not addressed26 or always performedbefore joins.143233 However, Ref. 1 demonstrates thatperforming local reductions before joins may not yieldoptimal results. Our solution allows local reductions andjoins to be performed in any order, that is, localreductions may be performed before joins, during joins,and/or after joins. Moreover, reductions can be carriedout by using indices or sequential scanning in this paper.

Sorted state. A base relation or an intermediate resultis said to be in a sorted state if the tuples of the relationare sorted in ascending order of an attribute to be joinedwith some other base/intermediate relations in sub-sequent joins. Sort-merge join will be cheaper if relationsare already sorted on the joining attributes. It is importantto note that although FNL join may have a higher costthan other join methods in some cases, the sortedintermediate result it yields can reduce the cost ofsubsequent (sort-merge) joins. This means that localoptimisation may not guarantee overall optimality. Thatis, without taking the sorted state of classes andintermediate results into consideration, overall optimalitymay not be achieved. In most previous papers, the issueof sorted state was not addressed. Selinger and Adiba33

only briefly mentioned a distinction between a sortedrelation and an unsorted one. Our solution takes sortedstates of original classes as well as intermediate resultsinto full consideration.

Bushy plan. In Refs 14, 32 and 33 it is required that atleast one of the relations involved in a join be a baserelation. In our formalism (also known as the 'bushyplan'), joins between two intermediate results areallowed. It is known that the strategy which only allowsat most one intermediate result during a query processingmay not necessarily yield an optimal result.

(2) In Refs 12, 43 and 45, only data communicationcosts involving semi-joins are considered. However,

100 THE COMPUTER JOURNAL, VOL. 35, NO. 2, 1992

Dow

nloaded from https://academ

ic.oup.com/com

jnl/article/35/2/98/360312 by guest on 06 February 2022

Page 4: Query Optimisation in Distributed Object-Oriented Database Systems*

QUERY OPTIMISATION IN OBJECT-ORIENTED DATABASE SYSTEMS

various researchers have demonstrated that in adistributed database system communication cost maynot necessarily be a dominating factor. We take bothlocal processing costs and communication costs intoconsideration. Furthermore, our approach allowsdifferent sites to have different processing speeds, andcommunication lines between different sites to havedifferent communication speeds.

(3) Our algorithm is able to uniformly handle manydifferent situations, including arbitrary target class at anarbitrary answer site, different sites with differentprocessing speeds, replicated copies of classes at differentsites, different communication speeds between differentsites and class hierarchies. The uniformity of thisalgorithm under so many diversified situations stronglydemonstrates the usefulness and the flexibility of thealgorithm. A single, yet uniform, algorithm capable ofhandling such a wide range of situations efficiently hasnot been reported before.

(4) The algorithm is shown to require 0{n3*h^complexity in minimising the total cost, where n isthe number of classes specified in the query,hy «S min(« + 2, h) and h is the number of sites in thedistributed environment., And finally, with slight modi-fication, this algorithm is applicable to query optimis-ation in relational database system.

This rest of the paper is organised as follows: ourprevious solution for centralised environments is brieflyreviewed in Section 2. Minimising the total cost indistributed environment is discussed in detail in Section3, where an optimal algorithm with complexity O(n3 * hjis provided, where hx < min (n + 2, h) and h is the numberof sites; extensions to various situations in a distributedenvironment are also discussed in this section.

2. REVIEW OF A SOLUTION FORCENTRALISED ENVIRONMENTS

We first briefly review a solution we obtained for acentralised environment.38

Let jc(Ctp Cj+lk) be the minimum cost to functionallyjoin (join, for short) Ctj and Cj+lk, where if / = s thesecond class is sorted (on its first attribute). As discussedabove, a superscript for the first class Ctj is not needed,since no matter whether Cu is sorted on ID, or not, itmakes no difference for the join cost. Thus,

jc(Ct p q + 1 k) = min{FNL(CU, Cj+Uk),RNL(Cu,C1+1,k),

C q ) } (i)= mm{FNL{Cij, Cj+Uk),

RNL(Cu,Cl+ltk),SM(CtJ,Cj+Uk)}.

Similarly, let jcs{Ct]p Cj+1 k) be the minimum cost tojoin Ofj and O1+1 k such that the intermediate result C, kis sorted. Then,

C,+1,k) = min{jc(Cu, Ci+lk)+ Sort (C, t ) , Sort (C,,)

F(CiC)} 'i,j+1,lcif / , * J (2)

= min (jc(Cu, Oj+1 k)+ Sort(Ct,k),FNL(Cii,Cj+Uk)},if t1 = s. (3)

In Formula (2), the first expression is for the situation

where intermediate result Cik is first obtained in unsortedform and then sorted explicitly; the second expression isfor explicitly sorting Cu first and then applying FNL toobtain C\ k. Formula (3) can be understood similarly.

Before we present an algorithm for a centralisedenvironment, the following are assumed: (1) no localreduction; (2) no target class is considered. The relaxationof the two assumptions in a centralised environment canbe found in Ref. 38.

Our solution will find all minimum costs for computingC,} and C'(j for all pairs of /' and j such that (j— 0 = k,\ k i j ^n, 0 k ^ n — \. Initially, k = 0. In eachiteration, k is incremented by 1 until k = n — 1. LetCost, j and Cost;, be the minimum costs for computingC,} and C\ p respectively. This is a bottom-up strategy,as shown in Figure 4. The /rth row indicates the costs forcomputing C, ,- for j — i = k and computations are donein the order of k = 0,...,«— 1. There is also a Cost^matrix, which is similar to the Cost, t matrix except thatthe latter computes costs Cu. The Cs

t} matrix will beconstructed synchronously with the Cost,, matrix, i.e.as soon as a row of Cost,., is computed, the correspondingrow of Cost?j is computed. This is repeated until CliF1 iscomputed. CUj is formed by computing C, m ® C^+lj

for some m, i m <j (note that at the time Cost, j is tobe computed, the minimum costs for constructing Cim

and Cm+1J, i^m ^j, which are Cost, m and Costm+1,respectively, have been computed by the bottom-upalgorithm). Cost,, is obtained by choosing the m thatyields the minimum cost. These are all possible ways inwhich C, j may be formed (if cross product is avoided).Note that both C, m and Cm+1J can be in sorted form, butonly the sorted state of Cm+li may affect the cost ofcomputing the join C, m ® Cm+1 r Thus,

{Cost,, m + Costm+1J +jc(Ct m, Cm+1J,(C , C°m+1J}. (4)

In order to compute Cu, we may first obtain C, t byusing Equation (4), and then explicitly sort CUi\ or wemay compute C\ m first and then apply FNL to join C\ mwith C*+1, (which can be sorted or not) to obtain C'u.Thus,

Cost?, = min , i m < } {Cost?,m + Costm+1,+ FNL (Q, m, Cm+lJ), Costu + Sort (CM)}(5)

The time complexity of the above algorithm can beshown to be O(n3).38 By definition, Cost,, = 0,1 < i n.If class C, is initially sorted, then CostJ', = 0, otherwiseCostJ, = Sort (C,,), 1 < / < n. From the initial condition,Equations (4) and (5), and the definition for jc/jcs, it iseasy to see that Cost,, ^ Cost?, for all 1 s£ i j ^ n.Equations (4) and (5) are the foundation for our followingdiscussions and extensions.

Cost,a Cost, Costn

Cost! 2 Cost 2.3

Figure 4. The Computation of all Cost'^sCostltn

Example 1. Let Cli l5 C2 2, C33 and C4 4 be four classesspecified in a query. To find Clp4 with the minimum cost,the following steps are followed.

Step 1. Construct Cli2 from Cl t l and C2JO22 byEquation (4) with me{l}, and then C\ 2 from C l 2 (by

ir fE COMPUTER JOURNAL, VOL. 35, NO. 2, 1992 101

Dow

nloaded from https://academ

ic.oup.com/com

jnl/article/35/2/98/360312 by guest on 06 February 2022

Page 5: Query Optimisation in Distributed Object-Oriented Database Systems*

W. SUN, W. MENG AND C. YU

sorting Cx 2) or from C\ v and C22 by Equation (5) with{\} C C d Q 4 can be similarly

g x 2me{\}. C23, C3t, C\ 3 and

dconstructed.Step 2. Construct Cl 3 from either Cl 2 joined with

C3 3/Q 3 or Cl t l joined with C2 JC\ 3 by Equation (4)with /M e {1,2}. Suppose Cost2 3 = 100 and Cost2 3 = 200,that is, the minimum cost for having C2 3 in sorted formis more expensive than the minimum cost for having C2 3is unsorted form. Suppose that in constructing Cx 3, sortmerge turns out to be the cheapest and Sort(C2 3) = 150.Therefore, we can see that obtaining C2 3 using a moreexpensive method but with C2 3 in sorted form couldyield the lowest cost for forming C\ 3. Construct C\ 3 byEquation (5) with me{1,2}. Similarly, C2 4 and C\ 4 canbe constructed.

Step3. Finally, C1>4 can be constructed with me {1,2,3}using Equation (4). •

3. A SOLUTION FOR DISTRIBUTEDENVIRONMENTSIn this section we investigate chain query optimisation ina distributed environment. In Section 3.1 we present analgorithm that computes Cln with the minimum totalcost under the similar assumptions used in Section 2, thatis, no local reduction and no target class are considered.In Section 3.2, processing of queries involving selectionsand projections, in addition to joins, is studied. Section3.3 extends the algorithm by allowing arbitrary targetclass and arbitrary answer site.

3.1 Minimum total cost without considering reductionsand target class

It is allowed that different sites may have differentprocessing speeds, and each class can have duplicatecopies at different sites. Let P} be the relative processingspeed of site/ (If P1 = 1 and P2 = 2, the CPU at site P2 istwo times as fast as that at site /\.) We now seek theminimum total cost for having a copy of Cx „ at somesite, where total cost is the sum of local processing costsand communication costs involved. Let T{a, X, b) be thecommunication cost for transferring X amount of datafrom site a to site b in the local network. If a = b, thenT = 0, otherwise, T > 0.

Let Cost, ] k be the minimum cost for having a copy ofC( t at site k, where Ct }s are denned in Section 1. Thereare two ways to have a copy of C(} at site k:

• directly construct Cf4 at site k;• first construct Ct} at a site other than k and then

transfer a copy of C,j to site k.

Let Cost* t k be the minimum cost for having a copy ofC\ j at site k. Let dijk be the minimum total cost forconstructing Cit at site k. <fijk is similarly defined. Inorder to construct C,} at site k, we need one copy of C( mand one copy of Cm+1, for some m at site k,i ^ m <j,and then perform join CirEquation (4), we have

Cm+1 r Thus, similar to

1 JC(Ci.m<Cm+l,j)1 —

,jc(C(,m,C°m+lJ)

The intuition of dividing the join cost by the processorpower is that the faster the CPU is, the less the join costs.By exhausting all the possible sites where Ct s may beconstructed, we have

= minnx,\CJ,k)}}i { d ( j x

Xf.SJTES, \Cu\,k)}, (7)

(6)

where SITES is the set of all possible sites in the network.Let the site with the fastest processing speed in thenetwork be v0, i.e. PVo = maxl^vill{Pv}, where h is thenumber of sites. Let SITESM be the set of sites containingthe n original classes, the fastest site v0 and the target sitew. It is easy to verify that to achieve the minimum cost,it is sufficient for k and x in Equation (7) to vary overSITESM. Since site w and v0 may be among the sitescontaining the original classes specified in the query, thenumber of sites that need to be effectively searched ishx ^ min (n + 2, h), where n is the number of classesreferenced by the query and h is the number of sites in thenetwork. Similar equations can be obtained for d*t j k andCost*j k based on Equation (5). We compute Cost,} k,Cost*jlc,d(jk and (fijk, with (j-i) = t, 1 ^ i j sg n, byusing the previously computed Costs, Costss, ds and dss,where the difference of the corresponding indices is lessthan t. In each iteration, all </s will be computed beforeany Cs are computed.

By Equation (6), for a given k, di} k can be computedfrom previously computed quantities in time pro-portional to (J—i). For a given t,0 t < n— 1, there aren — t pairs of/ and./ satisfying (j—i) = t. Thus, computingall di t ks for a given k needs time O(n3). Hence, computingall d'Uk, keSITESM takes time Oin3*^). Similarly, alld^j^k e SITESM can be computed in the same timecomplexity. For a given pair / and j , after all dt} x, xeSITESM, are computed, CostMiJt for each fceSITESMcan be computed in time O{h^) by using Equation (7)with SITES replaced by SITESM. This implies thatcomputing all Cost(J k takes no more than time O(n2 * hi)after all the ds and d*s are computed. Clearly, allCosr*M-*can also be computed in time complexity nomore than O(n2 * hi). Since hl is bounded by O{n), thetotal time complexity of the algorithm is no more thanO(n3 * h}).

The initial condition of the algorithm can be easilydecided. For example, consider the situation where allclasses are initially sorted. For a site k, Cost4, k = 0 andCostJ_fit = 0 if a copy of C( is at site k (duplicated copiesof a class are allowed). Otherwise, Cost,, k and Cost* t kwill be the cost of transferring a copy of C,, from someother site x, T(x, \Ct\,k). The following example involvesfive classes and five sites.

Example 2. Assume there are 5 sites {Su S2, S3, S4,55}and 5 classes {C15 C2, C3, C4, C5} in a distributed OODBas shown below:

SITE 1 2 3 4 5CLASS 1 2 3 4 5SPEED 2 1 1 1 1

Figure 5. An example.

Suppose the answer site is S2 (i.e. w = 2), the fastestsites is 51; and the query involves only C,, C2, C3, C4. Inthis example, the set of sites that need to be effectivelysearched is SITESM = {S1; S2, S3, SJ. In order to com-

102 THE COMPUTER JOURNAL, VOL. 35, NO. 2, 1992

Dow

nloaded from https://academ

ic.oup.com/com

jnl/article/35/2/98/360312 by guest on 06 February 2022

Page 6: Query Optimisation in Distributed Object-Oriented Database Systems*

QUERY OPTIMISATION IN OBJECT-ORIENTED DATABASE SYSTEMS

pute the minimum cost of having a copy of Cj 2 at eachsite i, /eSITESM, we first compute the minimum cost ofconstructing Cx 2 at site i, that is, dx 2 ( for each site /,/eSITESM, by applying Equation (6) (this will incurcommunication cost to transfer C\ and/or C2 to site / ifCj and/or C2 is not at site i). After all dh2(, /eSITESMhave been computed, then for each site /, /eSITESM, weapply Equation (7) to obtain the minimum cost of havinga copy of Q 2 at site /. C\ 2 can be computed in a similarfashion. Therefore, in the first iteration ( j - ' = •)> all

corresponding Costs for all /eSITESM are computed.In the second iteration, all d13t, d{ 3 (, d2Ai and d^A (

and the corresponding Costs can be similarly computed.In the final iteration, ClA2 as the result (if we ignoretarget class) is computed, where site S2 is the answersite. •

3.2 Local reductions

In this subsection we discuss how to incorporate localreductions (projections and/or selections) into the basicalgorithm to achieve the minimum total cost in adistributed environment. We first consider howreductions at one site can be incorporated into our basicalgorithm.

Clearly, if there are a number of selections/projectionsto be performed on a class at a given site i, it is cheaperto perform all such reductions at the same time, insteadof performing one at a time. Thus, we can assume thatthere is at most one such reduction per class. Again, thesorted status makes the problem more difficult.

Consider an intermediate result Cim,m>i, createdduring processing a query. We may assume that allreductions on the original classes C(, Ct+1,...,Cm havebeen performed by the time Cim has been created,because joins are performed to form the intermediateresult, and at the time joins are performed it is relativelycheap to perform local reductions on them, if any. Thus,the basic algorithm given in Section 3.1 is still applicableto all intermediate results. Therefore, it is sufficient toconsider reductions and joins involving original classes.Specifically, if there is a join between two original classesat the same site, one or both of them may havereductions to be performed; there are four different waysto execute the join and the reductions:

• perform reductions on both classes before the join;• perform reduction on one of the classes before the

join, and while the join is being performed executereduction on the other;

• same as (2) except that the order is reversed;• perform the join first, and during the join perform the

reductions.

One reduction method may be more expensive thananother but the intermediate result may be in differentstates (sorted or not). For example, scanning a class inascending order of UIDs and selecting the objects basedon certain criteria is expensive, but the intermediateresult obtained is sorted; while performing the selection(in particular when involving inequality) using indices isless expensive but may leave the intermediate resultunordered. Having the class ordered could be beneficialin a subsequent join if the class is the second argument inthe join operation. Let o_red (C() and /i_red (C,) be the

minimum costs to perform reduction on C( with theresulting class ordered and not necessarily ordered,respectively. Let rjc(C, m, Cm+li) be the minimum cost toperform the reductions on the classes and the joinbetween the classes. As discussed earlier, if C, m andCm+lj are intermediate results, all reductions mustalready have been performed and therefore the followingis true for all m > i and j > m+\:

rjc(CUm, Cm+1J =jc(Cim, Cm+1J),m > i and j > m + 1. (8)

Now we proceed to consider the following three cases:(1) both Cim and Cm+lj are original classes; (2) only Cimis an original class; and (3) only Cm+li is an originalclass.

• Suppose C, m is C, and Cm+1, is Ci+l. Then

rcj(Ct, Ci+1) = min {jc (C;, Cj+1), (9)

n_red(C()+yc(C(<,C(+1),d ( C ) ( C q )

w_red (Cj+1) +jc{Ct, Ci+1 (+1),«_red (C() + o_red (C(+1) +jc(Ct t, C'i+1 i+1),w_red ( Q + «_red (C(+1) +jc(Cu t, Ct+1 J+1)},

where the first expression denotes the situation that thereductions are taken at the time of joining the originalclasses C( and Ci+l; the second for reducing the originalclass C( to CM which is not necessarily sorted (whoseschema becomes [IDt, W(+l]) before the join; the thirdfor reducing C(+l to Ci+1,+1 in ordered form before thejoin; etc. Note that having the first argument in orderedform does not help reducing the cost of the join andtherefore no such expression is in rjc.

m Suppose C, m is C, and Cm+11 is C(+1}. In this case,neither reduction in the intermediate result nor havingthe first argument ordered is needed. Thus,

rjc (Q, CMti) = min {jc (C(, Ct+lJ),«_red ( Q +jc(C( „ CM ,)}, j > i + 1.

(10)

• Suppose Cm+lj is Cj and CUm is C{ i_l. In this case, forall (j— 1) > /', we have

rJc(cu-i> ct) = min{-/c(Cu_i, c>)>

(11)

Finally, let rjc' be the minimum join cost with theresulting class sorted while taking reduction into con-sideration. The equations for rjc' are similar to those forrjc.

We now consider the situation when reductions/joinon/between two classes in different sites are to beperformed. In this case, we perform reduction on anyclass before it is transferred to a different site for the joinoperation. The reason is that it is usual when a class is tobe transferred, all its objects are sent from secondarymemory to main memory to prepare (get packaged) forthe transmission, and it is relatively cheap to perform thereduction while they are in main memory. Furthermore,the reduction lowers the communication cost and theprocessing cost at the destination site.

THE COMPUTER JOURNAL, VOL. 35, NO. 2, 1992 103

Dow

nloaded from https://academ

ic.oup.com/com

jnl/article/35/2/98/360312 by guest on 06 February 2022

Page 7: Query Optimisation in Distributed Object-Oriented Database Systems*

W. SUN, W. MENG AND C. YU

3.3 Arbitrary target class and arbitrary answer site

In this subsection we show how to extend the aboveapproach to allow arbitrary target class at an arbitraryanswer site. We first review, in a centralised environment,how to obtain the qualified IDs for the target class fromthe IDs in Cln.

Usually, a user is interested in retrieving the IDs of agiven class. The class whose objects are required by theuser is called the target class denoted by Ct. In query Qx(see Figure 1), the target class Auto is the leftmost classin the chain. But in general the target class can be anyclass in the chain. The following query Q2 is used to findall persons who own at least one automobile and live incities within states with populations over 10000. Thetarget class in this query, Person, is not the leftmost classin the chain.

select Personwhere owner. hometown. state. population > 10000

In earlier discussion, Cln is computed withoutdesignating the target class. Assume Cln is obtainedfrom C1 , 'wi+1, a for some m, 1 m < n. AfterCln[IDm+1] is obtained, we can obtain the IDs forthe target class Ct by propagating the result fromC1 „ [Wm+1] to C(_! [IDt_v ID,], since the second attributeof C(_j is IDt, i.e. the desired result. Propagation iscarried out by a series of joins. For example, when t—\~£-m+\, the propagation from Cln[/Z)m+1], denoted byC'm+i [IDm+j, to C(_! [/De-i, IDt] consists of joiningCf

m+1[Wm+1] with Cm+l[IDm+1, IDm+2] to yield anintermediate result on attribute IDm+2. This is thenjoined with Cm+2 [Wm+2, IDm+3] to yield an intermediateresult on IDm+3. This process is repeated until a join withCt_x\IDt_x,ID^ is performed to yield the answer.Correctness issue of such propagations will be delayeduntil the end of this subsection.

Let the cost of the propagation from Cx „ [Wm+1] toC(_x [IDt_lt IDt] be P(Cf

m+1, C^) . An optimal algorithmthat computes the answer while taking into considerationthe propagation cost is as follows. Compute Cost( t andCost?, as in Section 2 for all j—i ^ n — 2. Then at the laststage Cost! „ is redefined as follows. (We can see that ift = m+l,tneanswcrisCiill[/Z)m+1]andP(Cjl+1,Ct_1) = 0,that is, no propagation is needed in this case.)*

< „ {Costlm + Costm+1 „, Cm+1, J + P(Cm+1, Ct_t

C

Costlf „ =

1, C,_,)}. (12)

Now we proceed to discuss, in a distributed en-vironment, how an arbitrary target class at an arbitraryanswer site can be incorporated into the above for-mulation. Clearly, we have to include the cost ofpropagating the result to the target class at the answerske.f The computation of C;n[/£>m+1] from Clm and

•e. C' iscm+i,n results in Cm+ln fully reduced,• We note that an alternative approach is to associate the UIDs of

the target class C, with all the intermediate results C, },i =S / ^j, thatis, the scheme for all such Ci}, i =£ / <y, consist of [ID,, fDj+l,IDt]; allothers remain unchanged. In this case, we do not need propagation, i.e.as soon as C1 „ is obtained, the result can be obtained directly. Withslight modification to the basic cost-estima:ion formulas by using neworiginal and intermediate class sizes, our dynamic programmingapproach is still applicable.

t And again, it is also possible to avoid this propagation.

obtained. In order to compute the answer, C'm+1 ispropagated to the target class C, by computing Cf

m+1 ®Cm+i® ... ® C(_x[IDt], assuming t-\^m+l withoutlosing generality.

Let Gf_t be the minimum total cost to perform C^_x ®C,_! <g> Cj ® ... ® Ct_x [IDt] with a copy of q_t at site kand with the answer at destination site w. After Cf

t_r ®C}-\ UD}], a set of object IDs for Cp i.e. a set of ID}s,denoted by Cj, can be obtained. We seek the minimumpropagation cost G"m+l, where q is the site that Ct n isobtained from Clm ® Cm+1 „, i.e. GQ

m+x is the minimumcost to compute C'm+1 ® Cm+1 ® Cm+2 ® ... ® Ct_J/£)Jand yield the answer C{[IDt] at answer site w, where Cf

m+1is obtained at site q. The way we compute all such Gx

t,m + 1 ^ / < t - 1 and x e SITES, is to compute all G?_x forall xe SITES first, then all Gl2 for all xe SITES, and soon. (?,*_!, for site k, m+ 1 > y - 1 > / - 1, can be obtainedfrom G),te SITES, as follows.

Suppose that the original class C}_x is at site u. Ingeneral, in order to compute G)_v the following steps aretaken.

(1) Transfer C{_x from site k to a site v, ye SITES.(2) Transfer C^ from site u to the site v, ye SITES.(3) Join C,_! and C _x at site v to produce Cj at site v.(4) Transfer C{ from site v to some site x, xe SITES.(5) At site x, Cj is obtained. The minimum propa-

gation cost from Cj to the target class at the answer siteis, by definition, G*.

To ensure that Steps OH 4 ) yield the minimum cost,site v and site x should be varied over all sites in SITES.Thus the following recursive equation is obtained:Gf-1 = mint,,zeSITES

(13)

where T(s,\R\,d) is the cost for transferring \R\ amountof data from site 5 to site d. Clearly, G(

w = 0 where Ct isthe target class and w is the answer site.

We start by computing Gf_x for all site k, k e SITES,which involves computing C(_j ® Ct_x [IDt] where C{_x isat site k. Equation (13) can be applied with Gf = 0 andx is restricted to w only for the first iteration. Althoughv ranges over all sites in SITES in Equation (13), it isimportant to observe that it is sufficient, in achieving theminimum, to vary v over the sites k, x, u and the site v0that has the fastest speed only. Thus, G*_j can becomputed in O^). Therefore, G*_x for different values ofk can be computed in 0{h\). We start this computation byj = t and progress until j = m + 2. This takes no morethan O(n*hl) time. Recall that ht is bounded by O(n).Thus the computation of all G*_j takes O(n2 * nj times.

Having computed G»+1,2 < m+ 1 < n,ye SITES, theminimum cost MIN_COST to obtain the answer at sitew is

+jc(CUm,Om+ui) (14)

104 THE COMPUTER JOURNAL, VOL. 35, NO. 2, 1992

Dow

nloaded from https://academ

ic.oup.com/com

jnl/article/35/2/98/360312 by guest on 06 February 2022

Page 8: Query Optimisation in Distributed Object-Oriented Database Systems*

QUERY OPTIMISATION IN OBJECT-ORIENTED DATABASE SYSTEMS

That is, the final answer is obtained by joining Clm ®Cm+1 n for some m at some site y (this yields Cf

m+1 at sitey) plus the minimum propagation cost from site y tothe answer site. Therefore the total time complexity isO(n3 * hl) + O(n2 * h^), which remains to be O(n3*h1).

Example 3. Using Example 2, suppose target class is C4(i.e. t = 4), answer site is S2 (i.e. w = 2), and the fastestsite is Sv In this case, again the set of sites that need tobe effectively searched is SITESM = {Slt S2, S3, S4}. Weapply Equation (13) to compute all minimum propa-gation costs Gy

m+l, 1 y *k 4, 1 m s£ 2, from fullyreduced C{ or C{ at every effectively searched site,namely, we will compute G\, G\, G\, G\, G\, G\, G\ and G\.By definition, initially G\ = 0. We start computing G%,1 < y ^ 4, then compute G\, 1 < y ^ 4. For instance, G\,that is, given C{ at site 1, the minimum cost to computeC{ ® C3 (at some common site v) so as to yield theanswer C{ at site S2 can be obtained by applyingEquation (13) as follows:

" 3 = m m i iiieSITESM

Jc(C{,C3) 7X»,|CJ|,2)

G\, G\ and G\ can be computed similarly. Note that inthe first iteration of computing G%,ye SITESM, x inEquation (13) is restricted to the answer site S2 only.Next, we will compute G\, again using Equation (13):

T{\,\C{\,V)+T{2,\C2\,V)

Finally, all G\, G\ and G\ can be computed similarly,where x varies over all sites in SITESM. •

Before we end this subsection, we shall point out thatthe propagation from an arbitrary result CltB[/Dm+j] tothe target class may not guarantee that only all qualifiedIDs of the target class will be obtained. For example,suppose in query Ql (see Figure 1), there is a selection'Auto.color = red' on class Auto and the IDs in theresult CliB are IDs of objects in Person. Clearly, if aperson, say pi, owns a red car and lives in a city withina state with a population of over 10000, then the ID ofp\ is in the IDs in the result C l n . However, when thepropagation from the qualified Person-IDs to the Auto-IDs is done, IDs of all cars owned by person p\ will bein the set of Auto-IDs just obtained. Since personal mayalso own cars that are not red, the above propagationmay result in a wrong set of Auto-IDs. We nowinvestigate in what situation propagation as definedabove can guarantee that only all the qualified IDs of thetarget class will be obtained.

Let Cx, C2,..., Cn be n classes in a chain query, with Cta domain class of an attribute of class Ct_v i = 2, ...,n.Let LCR be the Leftmost Class having Reduction in thechain and RC be the class such that the IDs in the resultC1 „ are IDs of objects in RC. Then we have thefollowing proposition.

Proposition 1. (1) If the target class Ct is not on the leftof LCR in the chain, we can always obtain the qualified

ID set for Ct by propagation if RC is not on the right ofCt. (2) If Ct is on the left of LCR in the chain, then wecan always obtain the qualified ID set for Ct bypropagation if RC is not on the right of LCR.

Proof. Below is the sketch of the proof for part (1) ofthe proposition. More rigorous proof can be found inRef. 30.* Part (2) can be proved similarly.

Sufficiency. By assumption, either Ct and RC are thesame class or Ct is on the right of RC in the chain. Inboth cases, each object in RC corresponds to exactly oneobject in Ct. Clearly, an object ax in C, is qualified iffthere is an object in RC that is qualified and correspondsto Oj. Hence, from the qualified ID set for RC we canobtain the qualified ID set for Ct by propagation.

Necessity. Assume Ct is on the left of RC. We use anexample to illustrate why the propagation from RC to Ctmay not be able to obtain the qualified ID set for CvSuppose in query Q2, there is a selection 'Auto.color =red' on class Auto and RC is City. That is, LCR is Auto,RC is City and C, is Person. Assume, in a database state,city Cj is within a state with a population over 10000 andis the hometown of two persons pt and p2 such that pxowns a red car and p2 does not own any red car. Clearly,the ID of cx will be among the IDs for CliB. This impliesthat after the propagation, both the IDs of p1 and p2 willbe obtained. This is incorrect since p2 is not qualified bynot owning a red car. •

Let CR be the rightmost class of the two classes LCRand C, in the chain. That is, if LCR = Ct, R = max{/, /}.Essentially, Proposition 1 says that we can guaranteethat the qualified ID set for the target class will beobtained by propagating the IDs of RC iff RC is not onthe right of CR. Thus we need to modify Equations (12)and (14) by requiring that the IDs in the result CliB areIDs of some class not on the right of CR. That is,1 ^ m < n in both Equations (12) and (14) should bereplaced by 1 < m < R.

In some object-oriented database systems, it is possiblethat multiple classes are designated as target class,therefore, Objects from more than one class may beretrieved to form new objects. The above algorithm canbe extended to cover this situation by modifyingEquations (12)-( 14) so that the IDs of RC are propagatedto IDs of all target classes. If we regard Ct in Proposition1 as the leftmost target class in the chain, Proposition 1is also applicable to the situation where multiple targetclasses are allowed (proof of the extended result can befound in Ref. 30).

4. CONCLUSIONSIn this paper we provide an algorithm to process typical(chain) queries in distributed object-oriented databasesystems using dynamic programming. Our algorithm iscapable of handling a wide range of issues, includinglocal reductions, arbitrary target class, the sorted state ofclasses and intermediate results, duplicated copies ofclasses at different sites, arbitrary class at arbitrary site,different sites with different processing speeds, different

* In Ref. 30 the proof is for hierarchies instead of for chains in thecontext of establishing a relational front-end on top of hierarchicaldatabase systems. However, the n classes in a chain query form ahierarchy with one branch, since there is a one-to-many relationshipbetween objects in Ct and objects in C,_,, i = 2, ...,n.

THE COMPUTER JOURNAL, VOL. 35, NO. 2, 1992 105

Dow

nloaded from https://academ

ic.oup.com/com

jnl/article/35/2/98/360312 by guest on 06 February 2022

Page 9: Query Optimisation in Distributed Object-Oriented Database Systems*

W. SUN, W. MENG AND C. YU

communication channels with different transmissionrates and class hierarchies. This demonstrates theusefulness and the flexibility of the algorithm in differentsituations. The uniformity of this algorithm under somany diversified situations strongly suggests the feasi-bility of yielding a practical query optimisation systemand the potential generalisation of the algorithm to moregeneral queries and in more complicated situations.

Unique identification of objects plays a very importantrole in differentiating a relational database from anobject-oriented database.39 This characteristic is fullyexploited by our approach, since the chaining from oneclass to another comes about naturally via the use of

UIDs. Clearly, with some minor modifications, it can beeasily observed that our approach is also applicable torelational database systems.

We shall try to extend our algorithm to process non-chain queries such as tree queries in both centralised anddistributed object-oriented database systems.

Acknowledgement

We are very grateful to Dr Won Kim of MCC for hiscomments on an earlier draft of this paper. We shouldalso like to thank anonymous referees for constructivecomments.

REFERENCES

1. P. Agrawal, D. Bitton, K. Gun, C. Liu and C. Yu, A casestudy for distributed query processing. Proceedings,International Symposium on Databases in Parallel andDistributed Systems, December 1988.

2. H. Afsarmanesh, D. Knapp, D. McLeod and A. Parker,An object-oriented approach to VLSI/CAD. Proceedings,International Conference on Very Large Data Bases, August1985, Stockholm, Sweden.

3. H. Afsarmanesh and D. Knapp, An extensible object-oriented approach to data bases for VLSI/CAD.Proceedings, International Conference on Very Large DataBases, 1986, Morgan-Kaufmann.

4. M. Ahlsen, A. Bjornerstedt, S. Britts, C. Hulten and L.Soderlund, An architecture for object management in OIS.ACM Transactions on Office Information Systems, 2 (3),173-196 (1984).

5. T. M. Atwood, An object-oriented DBMS for designsupport applications. Proceedings, IEEE COMPINT 85,Montreal, Canada, pp. 299-307.

6. F. Bancilhon et al., The design and implementation of O2an object-oriented database system. Proceedings of SecondInternational Workshop on Object-Oriented DatabaseSystems, Bad Munster, FRG, September 1988.

1. J. Banerjee, W. Kim and K. C. Kim, Queries in object-oriented databases. Proceedings, IEEE 4th InternationalConference on Data Engineering, Los Angeles, Feb. 1988,pp. 31-38.

8. D. Beech and J. Feldman, The integrated data model: adatabase perspective. Proceedings of the 9th InternationalConference on Very Large Data Bases, Florence, Italy,October 1983.

9. Grandy, Booch, Object-oriented development. IEEE Trans,on Software Engineering, SE-12 (2) (1986).

10. R. Bretl et al. The gemstone data management system. InObject-oriented Concepts, Applications and Databases,edited W. Kim and F. Lochovsky. Addison-Wesley, Read-ing, MA (1989).

11. M. J. Carey, D. J. De Witt and S. L. Vandenberg, A datamodel and query language of Exodus. Proceedings, ACM-SIGMOD 88, June, 1988, pp. 413-423.

12. D. M. Chiu, P. Bernstein and T. C. Ho, Optimizing chainquery in a distributed database system. Journal of Com-puting, 13 (1), 116-134 (1984).

13. B. J. Cox, Message/Object Programming: An EvolutionaryChange in Programming. IEEE Software (1986).

14. D. Daniels, P. Selinger et al. An Introduction to DistributedQuery Compilation in R*. IBM Research Report RJ 3497(41354), IBM Research Laboratories, San Jose, CA (1982).

15. O. Deux et al, The Story of O2. IEEE Transactions onKnowledge and Data Engineering 2 (1), 91-108 (1990).

16. A. Goldberg, Introducing the Smalltalk-80 system. Byte 6(8), 14-26 (1981).

17. A. Goldberg and D. Robson, Smalltalk-80: the Language

and its Implementation. Addison-Wesley, Reading, MA(1983).

18. L.Haas, J. Freytag, G. Lohman and H. Pirahesh, Ex-tensible query processing in Starburst. Proceedings, ACMSIGMOD 89, Portland, Oregon, 1989, pp. 377-388.

19. M. Hardwick and G. Sinha, A data management systemfor graphical objects. Proceedings of Conference on VeryLarge Data Bases, 1986.

20. B. Jenq, D. Woelk, W. Kim and W. Lee, Query processingin distributed Orion. Proceedings of Extended DatabaseTechnology (1990).

21. W. Kim and F. Lochovsky (eds), Object-Oriented Concepts,Applications and Databases. Addison-Wesley, Reading,MA (1989).

22. W. Kim, K. C. Kim and A. Dale, Indexing Techniques forObject-oriented Databases. MCC Technical Report DB134-87 (1987).

23. W. Kim, H.-T. Chou and J. Banerjee, Operations andimplementation of complex objects. IEEE Transactions onSoftware Engineering 14 (7), 985-996 (1988).

24. W. Kim, J. Garza, N. Ballou and D. Woelk, Architectureof the Orion next-generation database system. IEEETransaction on Knowledge and Data Engineering 2 (2),109-124 (1990).

25. Won Kim, Object-oriented databases: definition andresearch directions. IEEE Transactions on Knowledge andData Engineering 2 (3), 327-341 (1990).

26. S. Lafortune and E. Wong, A state transition model fordistributed query processing. ACM Transactions onDatabase Systems 11 (3), 294-322 (1986).

27. P. Lyngbaek and D. McLeod, Object management indistributed information systems. ACM Transactions onOffice Information Systems, (1984).

28. P. Lyngbaek and V. Vianu, Mapping a semantic databasemodel to the relational model. Proceedings ACM-SIGMOD, San Francisco, May 1987, pp. 132-142.

29. D. Maier et al. Development of an object-oriented DBMS.Proceedings of the 1st International Conference on Object-oriented Programming Systems, Languages, andApplications, Portland, OR, October 1986.

30. W. Meng and C. Yu, Processing hierarchical queries inheterogeneous environments. (Submitted for publication.)

31. O. M. Nierstrasz and D. C. Tsichritzis, An object-orientedenvironment for OIS applications. Proceedings of Con-ference on Very Large Data Bases, 1986.

32. P. G. Selinger et al. Access path selection in a relationaldatabase management system. Proceedings, ACM-SIGMOD 79, pp. 23-34.

33. P. G. Selinger and M. Adiba, Access Path Selection inDistributed Database Management Systems. IBM ResearchReport RJ 2883 (36439), IBM Research Laboratory, SanJose, CA (1980).

34. D. L. Spooner, Modeling mechanical CAD data with

106 THE COMPUTER JOURNAL, VOL. 35, NO. 2, 1992

Dow

nloaded from https://academ

ic.oup.com/com

jnl/article/35/2/98/360312 by guest on 06 February 2022

Page 10: Query Optimisation in Distributed Object-Oriented Database Systems*

QUERY OPTIMISATION IN OBJECT-ORIENTED DATABASE SYSTEMS

abstraction and object-oriented techniques. Proceedings ofConference on Very Large Data Bases, 1986.

35. M. Stefik and D. G. Bobrow, Object-oriented program-ming: themes and variations. The Al Magazine, pp. 40-62(1986).

36. M. Stonebraker and L. Rowe, The design of Postgres.Proceedings, ACM-SIGMOD 1986, Washington, DC, pp.340-355.

37. M. Stonebraker, L. Rowe and M. Hirohama, The im-plementation of Postgres. IEEE Transactions on Knowledgeand Data Engineering 2 (1), 125-142 (1990).

38. W. Sun, W. Meng and C. Yu, Query optimization in object-oriented database systems. Proceedings of the InternationalConference on Database and Expert Systems Applications,Vienna, Austria, August 29-31, 1990, pp. 215-222.

39. J. Ullman, Database and Knowledge-based Systems. Com-puter Science Press, Rockville, MD (1988).

40. D. Woelk, W. Kim and W. Luther, An object-orientedapproach to multimedia databases. Proceedings of ACM-SIGMOD, Washington DC, May 1986.

41. D. Woelk, and W.Kim, Multimedia information man-agement in an object-oriented database system. Proceedingsof Conference on Very Large Data Bases, Brighton, England,September 1987.

42. W. Wolf, An object-oriented procedural database for VLSIchip planning. Proceedings of Design Automation Con-ference, 1986.

43. C. Yu, K. Lam, C. Chang and S. Chang, A promisingapproach to distributed query processing. Berkeley Work-shop on Database and • Computer Network, 1982, pp.152-170.

44. C. Yu and C. Chang, Distributed query processing. Com-puting Surveys 16 (4) (1984).

45. C. Yu, M. Ozsoyoglu and K. Lam, Distributed queryoptimization for tree queries. Journal of Computer andSystem Science, pp. 409-445 (1984).

46. C. Zaniolo, The database language gem. Proceedings ACM-SIGMOD, San Jose, California, May 1983, pp. 207-217.

Announcements

6-8 JULY 1992

British National Conference on Databases,1992 (BNCOD 10), King's College, Aberdeen

The Tenth British National Conference onDatabases (BNCOD 10) is to be held atKing's College, Aberdeen from 6 to 8 July1992. Invited speakers include David Grad-well, speaking on 'Object-Oriented Require-ments and Capture' and Michael Brodiespeaking on "The Promise of DistributedComputing and the Challenges of LegacySystems'.

The main themes are:

• Conceptual data modelling;• Object-oriented databases;• Parallelism for performance.

For further information contact:

Professor Peter Gray, Department of Com-puting Science, King's College, AberdeenAB9 2UB.Tel: 0224-373396. Fax: 0224-487048.

10-12 JULY 1992

Women into Computing National Conference,'Teaching Computing: Content and Methods',Keele University

Women into Computing

Women into Computing (Wic) is a nationalorganisation committed to encouraging morewomen to enter the field of computing. Inaddition to promoting local initiatives in thisfield, WiC has organised several nationalworkshops. This will be our third largeconference.

Theme

The theme of this year's conference is teaching:what we teach and how we teach it. Thenumbers of women students on undergraduatecomputing courses have been steadily de-clining for the past 12 years. Since this trendbecame apparent in the mid 80s, a growingnumber of further and higher educationinstitutions have been running workshops of aday or longer aimed at informing and inter-esting young women in the subject. But despitethe success of these events in themselves, theyappear to have done little to improve thenumbers of female applicants for courses.

This conference will provide opportunitiesto examine the content and presentation of ourcourses; to ask whether the material we cover,and the way it is taught, could be adjusted toserve the needs of women better. It is relevantfor those teaching at all levels and all ages.

Format

The conference will consist mainly of a seriesof small parallel workshops, each basedaround a contributed paper. There will also bea guest speaker and a panel session in whichrepresentatives of different educational sectorswill be asked to discuss relevant initiatives intheir own sphere.

Registration

The registration fee, inclusive of meals, ac-commodation and conference proceedings, is£110 (£130 for late registration). Bursariesmay be available to students and others whohave difficulty in meeting the full cost.

For further information, contact:

Frances Grundy, Department of ComputerScience, University of Keele (email :[email protected]. keele. cs).

Erratum

There was a typesetting mistake on page 405of this Journal, Volume 34, Number 5(1991). In the correspondence section, in theletter by F. Hussain and M. L. V. Pitteway,there was a flow chart. The first condition onthe right-hand branch of the flow chart read:

This should read:

The editor wishes to apologise to the authorsfor this mistake.

THE COMPUTER JOURNAL, VOL. 35, NO. 2, 1992 107

Dow

nloaded from https://academ

ic.oup.com/com

jnl/article/35/2/98/360312 by guest on 06 February 2022