T.E. Sem. V [CMPN] Advanced Database Management...

1113/Engg/TE/Pre Pap/2013/CMPN/Soln/ADBMS 1

Vidyalankar T.E. Sem. V [CMPN]

Advanced Database Management Systems Prelim Question Paper Solution

EER Diagram

Painting (Paint_type, drawn on, style) Sculpture (material, height, weight, style) Others (type, style) art_objects (id, artist, year, title. description) artist_into (name, dateborn, date_died, description, main style) exhibition (name, start dt, and dt) display (id, name) (i) Select art_object. id, art object.time from art_object, display, exhibition where art_object.id = display.id and display.name = exhibition.name (ii) Select title, description from art_objects where paint_type = 'oil' and description = 'PERMANENR'

1. (a)

has

Artist into

main style

Description

Date DiedDate Born

Name

DisplayName

id

Exhibition

Name Start dt

MUSEUM hashasARTOBJECT

Artist year

id

Description

title

ISA OTHER

StyleType

STATUE

SCULPTURE

PAINTING

Permanent collection

ISA

Owned Borrow

Stylematerial

height

weight

Style Type

Drawn on

1. (b)

1. (c) Vidyala

nkar

(2) Vidyalankar : T.E. − ADBMS


Specialization and Generalization i) Specialization is the process of defining a set of subclasses of an entity type; this entity type is

called the superclass of the specialization. The set of subclasses that form a specialization is defined on the basis of some distinguishing characteristic of the entities in the superclass.

ii) For example, the set of subclasses {SECRETARY, ENGINEER, TECHNICIAN} is a

specialization of the superclass EMPLOYEE that distinguishes among EMPLOYEE entities based on the job type of each entity. We may have several specializations of the same entity type based on different distinguishing characteristics. For example, another specialization of the EMPLOYEE entity type may yield the set of subclasses {SALARIED_EMPLOYEE, HOURLY_EMPLOYEE}; this specialization distinguishes among employees based on the method of pay.

iii) The subclasses that define a specialization are attached by lines to a circle, which is

connected to the superclass. The subset symbol on each line connecting a subclass to the circle indicates the direction of the superclass/subclass relationship. Attributes that apply only to entities of a particular subclasssuch as TypingSpeed of SECRETARY are attached to the rectangle representing that subclass. These are called specific attributes (or local attributes) of the subclass. Similarly, a subclass can participate in specific relationship types, such as the HOURLY_EMPLOYEE subclass participating in the BELONGS_TO relationship in figure. We will explain the d symbol in the circles of figure and additional EER diagram notation shortly.

Fig. 1: EER diagram notation for representing specialization and subclasses. Constraints & Characteristics of specialization and Generalization i) In some specializations we can determine exactly the entities that will become members of

each subclass by placing a condition on the value of some attribute of the superclass. Such subclasses are called predicatedefined (or condition defined) subclasses.

ii) If all subclasses in a specialization have the membership condition on the same attribute of the superclass, the specialization itself is called an attributedefined specialization, and the attribute is called the defining attribute of the specialization.

2. (a)

Vidyala

nkar

Prelim Question Paper Solution


iii) Two other constraints may apply to a specialization. The first is the disjointness constraint, which specifies that the subclasses of the specialization must be disjoint. This means that an entity can be a member of at most one of the subclasses of the specialization. A specialization that is attributedefined implies the disjointness constraint if the attribute used to define the membership predicate is singlevalued. Fig. 2 illustrates this case, where the d in the circle stands for disjoint.

Fig. 2. An attributedefined specialization on the JobType attribute of EMPLOYEE

iv) The second constraint on specialization is called the completeness constraint, which may be total or partial. A total specialization constraint specifies that every entity in the superclass must be a member of some subclass in the specialization. For example, if every EMPLOYEE must be either an HOURLY_EMPLOYEE or a SALARIED_EMPLOYEE, then the specialization {HOURLY_EMPLOYEE, SALARIED_EMPLOYEE} of figure 1 is a total specialization of EMPLOYEE; this is shown in EER diagrams by using a double line to connect the superclass to the circle. A single line is used to display a partial specialization, which allows an entity not to belong to any of the subclasses. For example, if some EMPLOYEE entities do not belong to any of the subclasses {SECRETARY, ENGINEER, TECHNICIAN} then that specialization is partial. Notice that the disjointness and completeness constraints are independent. Hence, we have the following four possible constraints on specialization. Disjoint, total Disjoint, partial Overlapping total Overlapping, partial

The Database Application System Life Cycle Activities related to the database application system (micro) life cycle include the following phases: i) System definition : The scope of the database system, its users, and its applications are

defined. The interfaces for various categories of users, the response time constraints, and storage and processing needs are identified.

ii) Database design : At the end of this phase, a complete logical and physical design of the database system on the chosen DBMS is ready.

iii) Database implementation : This comprises the process of specifying the conceptual, external, and internal database definitions, creating empty database files, and implementing the software applications.

iv) Loading or data conversion : The database is populated either by loading the data directly or by converting existing files into the database system format.

2. (b) Vidyala

nkar



v) Application conversion: Any software applications from a previous system are converted to the new system.

vi) Testing and validation: The new system is tested and validated. vii) Operation: The database system and its applications are put into operation. Usually, the old

and the new systems are operated in parallel for some time. viii)Monitoring and maintenance: During the operational phase, the system is constantly

monitored and maintained. Growth and expansion can occur in both data content and software applications. Major modifications and reorganizations may be needed from time to time.

XML stands for Extensible Mark-up Language. The main drawback of HTML is that, it can specify only the format of the data being sent, it can not provide the meaning of the data. So, at the receiving end, if we require that incoming data must be processed by some application then we have to send meaning of the data along with the data. It can be done using XML. XML can specify the meaning of the data along with the data. Main features of XML It doesn’t have predefined set of tags. A document prepared in XML is said to be semi-structured document. It can provide meaning of the data. If we have a database of a banking application, which has 3 tables. Account(acno, balance), customer(custname, address) and depositor (acno, custname). An XML document can be prepared from this database as follows, <bank> <account> <acno>a101</acno> <balance>5000</balance> </account> <account> <acno>a102</acno> <balance>3000</balance> </account> <customer> <custname>c1</custname> <address>….</address> </customer> <customer> <custname>c2</custname> <address>….</address> </customer> <depositor> <acno>a101</acno> <custname>c1</custname> </depositor> <depositor> <acno>a102</acno> <custname>c2</custname> </depositor> </bank> The same document can be prepared using different structure. <bank> <account>

3. (a)

Vidyala

nkar



<acno>a101</acno> <balance>5000</balance> <customer> <custname>c1</custname> <address>….</address> </customer> </account> <account> <acno>a102</acno> <balance>3000</balance> <customer>

<custname>c2</custname> <address>….</address> </customer> </account> </bank> From this example it can be noted that the same data along with its meaning can be represented using different XML document structure. Attributes Just like HTML, we can specify attributes in the tag definitions. For example, Accont type could be specified as an attributes of account tag. <account actype = “saving”> <acno>a101</acno> <balance>5000</balance> </account> Or we could have included actype as a subelement of the account tag. <account> <acno>a101</acno> <balance>5000</balance> <actype>savings</actype> </account> That is, main aim is to provide meaning of the data. Whether it is specified in terms of subelement or attribute is not important. Name space To identify XML document uniquely, xml name space is used. <bank xmlns:fb = www.firstbank.com> <fb:account> <fb:acno>a101</fb:acno> <fb:balance>5000</fb:balance> </fb:account> </bank> XML schema Schema is a set of rules or constraints to be followed by data stored in the repository. For example, data in RDBMS satisfies relational schema. The freedom of specifying data using any internal structure is actually undesirable when we want to process the data automatically using some receiving application. So, we require that the sender follows some set of rules, called XML schema, while preparing XML document.

Vidyala

nkar



XML Querying Xpath : It is a language of path expression which is a sequence of location steps separated by forward slash (/). e.g. a query /bank / customer / custname returns all the customer name sub-elements present under each customer element that is present under the root element bank. <custname>c1</custname> <custname>c2</custname> …… If we want only value of the element without the tags then we can use text ( ) function.

e.g. a query /bank / customer / custname / text ( ) returns only customer names without <custname> tags. We can access the attributes also.

e.g. a query /bank / account / @acno returns all the acno attribute values under each account element. We may specify conditions also.

e.g. a query /bank / account / [balance > 4000] returns all account elements which has balance subelement with value greater than 4000. Similarly, the query /bank / account / [balance > 4000] / @acno returns account numbers of all the accounts where balance is greater than 4000. A function id(“value”) returns all the elements with ID attribute whose value is supplied to the function. e.g. a query /bank / account / id(@owners) returns all the customer elements being referred from the account elements. XQuery : It has a structure similar to SQL. for … let … where … return … e.g. to retrieve all the account numbers where balance is greater than 4000 for $x in bank / account let $y = $x / @acno where $x / [balance > 4000] return <acno>$y</acno> We may perform join operation in XQuery. e.g. to merge the information of accounts with customers in a single element. for $a in bank / account $c in bank / customer $d in bank / depositor where $a / acno = $d / acno and $c / custname = $d / custname return <acctcust> $a $c<acctcust> CostBased Query Optimization: A costbased optimizer generates a range of queryevaluation plans from given query by using the equivalence rules, and chooses the one with the least cost. For a complex query, the number of different query plans that are equivalent to a given plan can be large. As an illustration, consider the expression:

r1 ⋈ r2 ⋈ ….⋈ rn

3. (b)

4. (a)

Vidyala

nkar



where the joins are expressed without any ordering. With n = 3, there are 12 different join orderings as follows:

r1 ⋈ (r2 ⋈ r3) r1 ⋈ (r3 ⋈ r2) (r2 ⋈ r3) ⋈ r1 (r3 ⋈ r2) ⋈ r1

r2 ⋈ (r1 ⋈ r3) r2 ⋈ (r3 ⋈ r1) (r1 ⋈ r3) ⋈ r2 (r3 ⋈ r1) ⋈ r2

r3 ⋈ (r1 ⋈ r2) r3 ⋈ (r2 ⋈ r1) (r1 ⋈ r2) ⋈ r3 (r2 ⋈ r1) ⋈ r3

In general, with n relations, there are (2(n1)! / (n1)! different join orders. For joins involving small numbers of relations, this number is acceptable, for example, with n = 5, the number is 1680. However, as n increases, this number rises quickly. With n = 7, the number is 665,280, with n = 10, the number is greater than 17.6 billion. Luckily, it is not necessary to generate all the expressions equivalent to a given expression. For example, suppose we want to find the best join order of the form:

(r1 ⋈ r2 ⋈ r3) ⋈ r4 ⋈ r5

which represents all join orders where r1, r2 and r3 are joined first (in some order), and the result is joined (in some order) with r4 and r5. There are 12 different join orders for computing

r1 ⋈ r2 ⋈ r3, and orders for computing the join of this result with r5 and r5. Thus, there appear to be

144 join orders to examine. However, once we have found the best join order for the subset of relations {r1, r2, r3}, we can use that order for further joins with r4 and r4, and can ignore all

costlier join orders of r1 ⋈ r2 ⋈ r3. Thus, instead of 144 choices to examine, we need to examine

only 12 + 12 choices. Select size Estimate The size estimate of the result of a selection operation depends on the selection predicate. We will consider various cases a single equality predicate, then a single comparison predicate and finally combinations of predicates. (1) (a) A = a(r), if we assume that each value appears with equal probability, the selection

result can be estimated to have rn

V(A,r).

(b) It is not realistic to assume that each value appears with probability. (c) For example consider "branchname" attribute in the account relation certain

"branchname" values appear with greater probability than others. (2) A v(r), consider a selection of the form A v(r). (a) If the actual value used in comparison (v) is available at the time of cost estimation, a

more accurate estimate can be made. (b) The lowest and highest values (min (A, r) and max (A, r)) for the attributes can be stored

in the catalog. (c) Assuming that the values are uniformly distributed we can estimate the number of

records that will satisfy the condition A v as 0 if v < min (A, r) as nr if v max (A, r) and,

rV min (A,r)

n .max (A,r) min (A,r)

(d) When a query is part of stored produce the value v may be available. In such cases we will assume that approximately one half the records will satisfy the comparison condition.

(3) Complex selections (a) Conjunction (i) A conjunctive selection is a selection of the form 1 2 n..... (r)

(ii) For each I we estimate the size of the selection I (r) denoted by si. Thus the probability that a tuple in the relation satisfy selection condition I is si/nr.

Vidyala

nkar



Thus we estimate the number of tuples in the full selection as

1 2 nr n

r

s *s ....sn *

n

(b) Disjunction (i) A disjunction selection is a selection of the form 1 2 nv ....v (r) (ii) A disjunction condition is satisfy by the union of the records satisfying the individual

simple condition .

(iii) 1 2 n

r r r

s s s1 1 * 1 *....* 1

n n n

Multiplying this value by nr gives us the estimated number of tuples that satisfy the selection. Join Size Estimate (1) The Cartesian product r s contains nr ns tuples. Each tuple of r s occupies r s bytes

from which we can calculate the size of the Cartesian product. (2) Estimating the size of natural join is complicated. Let r(R) and s(S) be relations.

(a) If R ∩ S = that is relations have no attribute in common then r ⋈ s is same as r s.

(b) If R ∩ S is a key for R, then we know that a tuple of s will join with at most one tuple

from r. Therefore the number of tuples in r ⋈ s is no greater than number of tuples in s.

(c) The most difficult case is when R ∩ S is a key of neither R nor s. In this case we assume,

as we did for selections that each value appears with equal probability.

(3) We can estimate the size of a theta join r ⋈ s by rewriting the join as s (r s) and using the

size estimates for Cartesian products along with the size estimates for selections.

(4) To explain all these ways of estimating join sizes consider depositor ⋈ customer.

Assume the following catalog information about the two relations. (i) ncustomer = 10000

(ii) fcustomer = 25 which implies bcustomer = 10000

40025

(iii) ndepositor = 5000 (iv) V(customername, depositor) = 2500, which implies that on average each customer has

two accounts.

(5) In this example of depositor ⋈ customer, customername in depositor is a foreign key

referencing customer, hence the size of the result is exactly ndepositor which is 5000.

(6) We will now compute the size estimates for depositor ⋈ customer without using information

about foreign keys. (7) Since V(customer_name, depositor) = 2500 and V(customer_name, customer) = 10000, the

two estimates we get are 5000 10000 5000 10000

20,000 and 50002500 1000

and we choose

the lower one. We first describe how the twophase commit protocol (2PC) operates during normal operation, then describe how it handles failures and finally how it carries out recovery and concurrency control. Consider a transaction T initiated at site Si where the transaction coordinator is Ci. The Phase Commit Protocol When T completes its execution that is, when all the sites at which T has executed inform Ci that T has completed Ci starts the 2PC protocol.

4. (b) Vidyala

nkar



Phase 1: Steps Ci adds the record <prepare T> to the log, and forces the log onto stable storage. It then sends a prepare T message to all sites at which T executed. On receiving such a message, the transaction manager at that site determines whether it is willing to commit its portion of T. If the answer is no, it adds a record <no T> to the log, and then responds by sending an abort T message to Ci. If the answer is yes, it adds a record <ready T> to the log, and forces the log (with all the log records corresponding to T) onto stable storage. The transaction manager then replies with a ready T message to Ci. Phase 2: Steps When Ci receives responses to the prepare T message from all the sites, or when a prespecified interval of time has elapsed since the prepare T message was sent out, Ci can determine whether the transaction T can be committed or aborted. Transaction T can be committed if Ci received a ready T message from all the participating site. Otherwise, transaction T must be aborted. Depending on the verdict, either a record <commit T> or a record <abort T> is added to the log and the log is forced onto stable storage. At this point, the fact of the transaction has been sealed. Following this point, the coordinator sends either commit T or an abort T message to all participating sites. When a site receives that message, it records the message in the log. A site at which T executed can unconditionally abort T at any time before it sends the message ready T to the coordination. Once the message is sent, the transaction is said to be in the ready sate at the site. The ready T message is, in effect, a promise by a site to follow the coordinator's order to commit T or to abort T. To make such a promise, the needed information must first be stored in stable storage. Otherwise, if the site crashes after sending ready T, it may be unable to make good on its promise. Further, locks acquired by the transaction must continue to be held till the transaction completes. Parallel and distributed database

Parallel database Distributed database 1 Parallel Database System seeks to

improve performance through parallelization of various operations, such as data loading, index building and query evaluating. Although data may be stored in a distributed fashion in such a system, the distribution is governed solely by performance considerations.

1 In Distributed Database System, data is physically stored across several sites, and each site is typically managed by a DBMS capable of running independent of the other sites. In contrast to parallel databases, the distribution of data is governed by factors such as local ownership and increased availability.

2 Parallel DBMS consists of tightly coupled, high-bandwidth link connected, non-autonomic nodes.

2 Distributed DBMS consists of many Geo-distributed, low-bandwidth link connected, autonomic sites.

3 Nodes in Parallel DBMS can only work together to handle global transactions.

3 Sites in Distributed DBMS can work independently to handle local transactions or work together to handle global transactions.

4 Parallel DBMS is for High Performance, High Availability

4 Distributed DBMS is for Sharing Data, Local Autonomy, High Availability

5 Slower data access. 5 Faster data access. 6 Operating cost increases as many

workstation have to be maintained. 6 Reduced operating cost: It is much

costeffective to add workstations to a network than to update a mainframe system.

7 Danger for single point failure in more as, a single mainframe can go down.

7. Less danger of a single point failure.

8 Dependence upon a single data copy. 8 Processor Independence.

5. (a)

Vidyala

nkar



The Nested Relational Data Model: An approach that process the use of nested tables, also known as non normal form relations. No commercial DBMS has chosen to implement this concept in its original form, the nested relational model removes the restriction of first normal form (INF) from the basic relational model, and thus is also known as the NonINF or NonFirst Normal Form (NFNF) of NF2 relational model. In the basic relational model also called the flat relational model attributes are required to be singlevalued and to have atomic domains. The nested relational model allows composite and multivalued attributes, thus leading to complex tuples with a hierarchical structure. This is useful for representing objects that are naturally hierarchically structured.

To define the DEPT schema as a nested structure, we can write the following: DEPT = (DNO, DNAME, MANAGER, EMPLOYEES, PROJECTS, LOCATIONS) EMPLOYEES = (ENAME, DEPENDENTS) PROJECTS = (PNAME, PLOC) LOCATIONS = (DLOC) DEPENDENTS = (DNAME, AGE)

First, all attributes of the DEPT relation are defined. Next, any nested attributes of DEPTnamely, EMPLOYEES, PROJECTS and LOCATIONsare themselves defined. Next, any secondlevel nested attributes, such as DEPENDENTS of EMPLOYEES, are defined, and so on. All attribute names must be distinct in the nested relation definition. Notice that a nested attribute is typically a multivalued composite attributed, thus leading to a "nested relation" within each tuple. For example, the value of the PROJECTS attribute within each DEPT tuple is a relation with two attributes (PNAME, PLOC). Other nested attributes may be multivalued simple attributes, such as LOCATIONS of DEPT. It is also possible to have a nested attribute that is single valued and composite, although most nested relational models treat such an attribute as though it were multivalued.

When a nested relation database schema is defined, it contains of a number of external relation schemas; these define the top level of the individual nested relations. In addition, nested attributes are called internal relation schemas, since they define relational structures that are nested inside another relation. In our example, DEPT is the only external relation. All the others EMPLYOYEES, PROJECTS, LOCATIONS and DEPENDENTS are internal relations. Finally, simple attributes appear at the leaf level and are not nested.

If is important to be aware that the three firstlevel nested relations in DEPT represent independent information. Hence, EMPLOYEES represents the employees working for the department, and LOCATIONS represents the various department locations. The relationship between EMPLOYEES and PROJECTS is not represented in the schema; this is an M:N relationship, which is difficult to represent in a hierarchical structure.

Extensions to the relational algebra and to the relational calculus, as well as to SQL, have been proposed for nested relations. Here, we illustrate two operations, NEST and UNNEST, that can be used to augment standard relational algebra operations for converting between nested and flat relations. Consider the flat EMP_PROJ relation, and suppose that we project it over the attributes SSN, PNUMBER, HOURS, ENAME as follow: EMP_PROJ_PLAT<7t SSN, ENAME, PNUMBER, HOURS (EMP_PROJ)

To create a nested version of this relation, where one tuple exists for each employee and the (PNUMBER, HOURS) are nested, we use the NEST operation as follows: EMP_PROJ_NESTED <NEST PROJS = (PNUMBER, HOURS) (EMP_PROJ_FLAT)

The effect of this operation is to create an internal nested relation PROJS = (PNUMBER, HOURS) within the external relation EMP_PROJ_NESTED. Hence, NEST groups together the tuples with the same value for the attributes that are not specified in the NEST operation; these are the SSN and ENAME attributes in our example. For each such group, which represents one employee in our example, a single nested tuple is created with an internal nested relation PROJS = (PNUMBER, HOURS). Hence, the EMP_PROJ_NESTED relation looks like the EMP_PROJ relation shown in Figure.

5. (b)

Vidyala

nkar



EMP PROJ SSN ENAME PROJS PNUMBER HOURS

EMP PROJ

SSN ENAME PNUMBER HOURS12345 John B 1 33 2 8 11111 Mary K 3 40 22222 Joscff 2 10 3 10 10 10 20 10

Notice the similarity between nesting and grouping for aggregate functions. In the former, each group of tuples becomes a single nested tuple; in the latter, each group becomes a single summary tuple after an aggregate function is applied to the group. The UNNEST operation is the inverse of NEST. We can reconvert EMP_PROJ_NESTED to EMP_PROJ_FLAT as follows: EMP_PROJ_FLAT < UNNEST PROJS = (PNUMBER, HOURS) (EMP_PROJ_NESTED) Here, the PROJS nested attribute is flattened into its components PNUMBER, HOURS. Nested tuples resemble complex objects, with a strictly hierarchical structure. Subclasses, Superclass and Inheritance i) First EER model concept is subclass of an entity type. An entity type is used to represent both

a type of entity and the entity set or collection of both type that exist in the database. e.g. The entity type EMPLOYEE describes the type of each employee entity, and also refers

to the current set of EMPLOYEE entities in the COMPANY database. In many cases an entity type has numerous subgroupings e.g. The entities that are members of the EMPLOYEE entity type may be grouped further into SECFRETARY, ENGINEER, MANAGER, TECHNICIAN, SALARIED EMPLOYEE, HOURLY_EMPLOYEE and so on.

ii) The set of entities in each of the later groupings is a subset of the entities that belong to the EMPLOYEE entity set, meaning that every entity that is a number of one of these subgroupings is also an employee. Each of these subgroupings is called subclass of the EMPLOYEE entity type, and the EMPLOYEE entity type is called the superclass for each of these subclasses.

iii) The relationship between a superclass and any one of its subclass is called superclass/subclass or simply class/subclass relationship.

iv) An entity cannot exist in the databases merely by being a member of the superclass. e.g. A salaried employee who is also an engineer belongs to the two subclasses ENGINEER

and SALARIED_EMPLOYEE entity type.

v) An important concept associated with subclass is type inheritance. Since an entity in the subclass represents the same realworld entity from the superclass, it should posses values for its specific attributes as well as values of its attributes as a member of the superclass, it should posses values for its specific attributes as well as value of its attributes as a member of the superclass.

An entity that is a member of a subclass inherits all the attributes of the entity as a member of the superclass.

The entity also inherits all the relationship in which superclass participates. (i) Opaque Type. The opaque type has its internal representation hidden, so it is used for

encapsulating a type. The user has to provide casting functions to convert an opaque object

6. (a)

6. (b)

Vidyala

nkar



between its hidden representation in the server (database) and its visible representation as seen by the client (calling program). The user functions send/receive are needed to convert to/from the server internal representation from/to the client representation. Similarly, import/export functions are used to convert to/from an external representation for bulk copy from/to the internal representation. Several other functions may be defined for processing the opaque types, including assign( ) destroy( ), and compare( ).

The specification of an opaque type includes its name, internal length if fixed, maximum

internal length if it is variable length, alignment (which is the byte boundary), as well as whether or not it is hashable (for creating a hash access structure). If we write

CREATE OPAQUE TYPE fixed-opaque-udt (INTERNALLENGTH = 8, ALIGNMENT = 4, CANNOTHASH); CREATE OPAQUE TYPE var_opaque_udt (INTERNALLENGTH = variable, MAXLEN=1024, ALIGNMENT = 8); then the first statement creates a fixed-length user-defined opaque type, named

fixed_opaque_udt, and the second statement creates a variable length one, named var_opaque_udt. Both are described in an implementation with internal parameters that are not visible to the client.

(ii) Distinct Type The distinct data type is used to extend an existing type through inheritance.

The newly defined type inherits the functions/routines of its base type, if they are not overridden. For example, the statement

CREATE DISTINCT TYPE hiring-date AS DATE;

creates a new user-defined type, hi ri ng-date, which can be used like any other built-in type. (iii) Row Type. The row type, which represents a composite attribute, is analogous to a struct

type in the C programming language. It is a composite type that contains one or more fields. Row type is also used to support inheritance by using the keyword UNDER, but the type system supports single inheritance only. By creating tables whose tuples are of a particular row type, it is possible to treat a relation as part of an object-oriented schema and establish inheritance relationships among the relations. In the following row type declarations, employee_t and student_t inherit (or are declared under) person_t:

CREATE ROW TYPE person_t(name VARCHAR(60), social_security NUMERIC(9), birth_date DATE); CREATE ROW TYPE employee_t(salary NUMERIC(10, 2), hired_on hiring_date) UNDER person_t; CREATE ROW TYPE student-t(gpa NUMERIC(4,2), address VARCHAR(200)) UNDER person_t; (iv) Collection Type. Informix Universal Server collections include lists, sets, and multisets

(bags) of built-in types as well as user-defined types. A collection can be the type of either a field in a row type or a column in a table. The elements of a set collection cannot contain duplicate values, and have no specific order. The list may contain duplicate elements, and order is significant. Finally, the multiset may include duplicates and has no specific order. Consider the following example:

CREATE TABLE employee (name VARCHAR(50) NOT NULL, commission MULTISET (MONEY));

Here, the employee table contains the commission column, which is of type multiset. Geographical Information System(GIS) Geographical Information System (GIS) is a system for capturing, storing, checking, integrating, manipulating, analyzing and displaying data which are spatially referenced to the Earth. It is a special case of information system where the database consists of observations on spatially distributed features, activities or events, which are definable in space as points, lines or area. A geographic

7. (a)

Vidyala

nkar



information systems manipulates data about these points, lines and areas to retrieve data for ad hoc queries and analysis. GIS makes connections between activities based on spatial proximity. GIS has developed from two independent areas: digital cartography and databases. These developments are closely related to the enormous growth in power, and the corresponding reduction in the cost of computer technology. The key to establishing this type of technology within an information framework for the purposes of decision making is INTEGRATION: the linking together of technology, data and a decision making strategy. GIS is all about bringing together of spatial analysis techniques and digital spatial data combined with computer technology. GIS is much more than a computer database and a set of tools: it is also a philosophy for information management. Often GIS can form the core of the information management within an organization. GIS consists of the following components: i) Data ii) Software & hardware tools iii) GIS data manipulation & analysis The benefits of GIS include: i) Better information management ii) Higher quality analysis iii) Ability to carry out "what if?" scenarious iv) Improve project efficiency GIS Applications Facilities management: Utilities such as electricity, gas, water and cable communication

companies all use GIS systems to store, retrieve and analyse their plant and materials. Areas such as customer responses, demand forecasting, fault analysis, network assessment analysis, site planning, strategic planning and market analysis can be generated by the GIS.

Marketing and retailing: These applications tend towards targeting customers and identifying potential markets for customers. The extensive datasets generated from the use of loyalty cards can also be used in conjunction with GIS. Other applications may include: media planning, territory allocation and prospect analysis.

Environmental: Forestry management, impact analysis, resource management, coastal zone mapping, geophysical & geotechnical surveys.

Transport/vehicle routing: This is an example of 'realtime' GIS and is used particularly by vehicle routing companies and the emergency services who need to know where there vehicles are located at any given time.

Vehicle routing can also be assessed in terms of least cost or efficiency. In addition GIS may also be used for; dispatch, scheduling, franchise planning as well as route planning.

Health: Disease mapping as well as epidemiology, facility planning, provider & purchaser planning, expenditure monitoring and patient analysis can all be carried out using GIS.

Insurance: Risk distribution analysis, catastrophe planning. Customer service analysis, hazard & prediction analysis and underwriting.

SQL includes a feature for testing whether a subquery has any tuples in its result. The exists construct returns the value true if the argument subquery is nonempty. Using the exists construct, we can write the query: "Find all customers who have both an account and a loan at the bank"

select customer-name form borrower where exists (select * from depositor where depositor.customer_name = borrower.customer_name)

We can test for the non-existence of tuples in a subquery by using the not exists construct. We can use the not exists construct to simulate the set containment (than is, superset) operation. We can write, "relation A contains relation B" as "not exists (B except A)." To illustrate the non exists operator, consider again the query "Find all customers who have an account at all the

7. (b) Vidyala

nkar



branches located in Brooklyn." For each customer, we need to see whether the set of all branches at which that customer has an account contains the set of all branches in Brooklyn. Using the except construct, we can write the query as follows: select distinct S.customer_name from depositor as S where not exists ((select branchname from branch where branch_city = 'Mumbai') except (select R.branch_name

from depositor as T, account as R where T.account_number = R.account_number and S.customername = T.customername))

Here, the subquery (select branchname from branch where branch_city = 'Mumbai') Find all the branches in Mumbai. The subquery (select R.branch_name from depositor as T, account as R where T.account_number = R.account_number and S.customername = T.customername)

find all the branches at which customer S.customerjiame has an account. Thus, the outer select takes each customer and tests whether the set of all branches at which that customer has an account contains the set of branches located in Mumbai. In queries that contain subqueries, a scoping rule applies for tuple variables. In a subquery, according to the rule, it is legal to use only tuple variables defined in the subquery itself or in any query that contains the subquery. If a tuple variable is defined both locally in a subquery and globally in a containing query, the local definition applies. This rule is analogous to the usual scoping rules used for variables in programming languages. Need of Replication If relation r is replicated, a copy of relation r is stored in two or more sites. In the most extreme case, we have full replication, in which a copy is stored in every site in the system. There are a number of advantages and disadvantages to replication. Availability: If one of the sites containing relation r fails, then the relation r can be found in

another site. Thus, the system can continue to process queries involving r, despite the failure of one site.

Increased parallelism: In the case where the majority of accesses to the relation r result in only the reading of the relation, then several sites can process queries involving r in parallel. The more replicas of r there are, the grater the chance that the needed data will be found in the site where the transaction is executing. Hence, data replication minimizes movement of data between sites.

Query Processing in Replicated Distributed Database: Consider an extremely simple query. "Find all the tuples in the account relation". Although the query is simple indeed, it is not trivial, since the account relation may be Fragmented, replicated, or both. If the account relation is replicated, we have a choice of replica to make. If no replicas are fragmented, we choose the replica for which the transmission cost is lowest.

7. (c)

Vidyala

nkar



However, if a replica is fragmented, the choice is not so easy to make, since we need to compute several joins or unions to reconstruct the account relation. In this case, the number of strategies for our simple example may be large. Query optimization by exhaustive enumeration of all alternative strategies may not be practical in such situations. Fragmentation transparency implies that a user may write a query such as branchname = "Hillside" (account) Since account is defined as, Account1 account2 the expression that results from the name translation scheme is branchname = "Hillside" (account1 account2) branchname = "Hillside (account1) branchname = "Hillside" (account2) Which includes two subexpressions. The first involves only account1, and thus can be evaluated at the Hillside site. The second involves only account2, and thus can be evaluated at the Valleyview site. There is a further optimization that can be made in evaluating branchname = "Hillside" (account1)

Since account1 has only tuples pertaining to the Hillside branch, we can eliminate the selection operation. In evaluating branchname = "Hillside" (account2)

We can apply the definition of the account2 fragment to obtain branchname = "Hillside" (branchname = "Valleyview" (account)

Thus expression is the empty set, regardless of the contents of the account relation. Thus, our final strategy is for the Hillside site to return account1 as the result of the query. Fragmentation Fragmentation consists of breaking a relation into smaller relations or fragments and storing the fragments (instead of the relation itself), possibly at different sites. In horizontal fragmentation, each fragment consists of a subset of rows of the original relation. In vertical fragmentation, each fragment consists of a subset of columns of the original relation. Horizontal and fragments are illustrated in figure. Data Fragmentation:

7. (d)

Horizontal and vertical fragmentation

Vidyala

nkar



Typically, the tuples that belongs to a given horizontal fragment are identified by a selection query. For example, employee tuples might be organized into fragments by city, with all employees in a given city assigned to the same fragment. The horizontal fragment shown in figure corresponds to Chicago. By storing fragments in the database site at the corresponding city, we achieve locality of reference Chicago data is most likely to be updated and queried from Chicago, and storing this data in Chicago makes it local (and reduces communication costs) for most queries. Similarly, the tuples in a given vertical fragment are identified by a projection query. The vertical fragment in the figure results from projection on the first two columns of the employees relation. When a relation is fragmented, we must be able to recover the original relation from the fragments: Horizontal Fragmentation: The union of the horizontal fragments must be equal to the

original relation. Fragments are usually also required to be disjoint. Vertical Fragmentation: The collection of vertical fragments should be a losslessjoin

decomposition. To ensure that a vertical fragmentation is lossless join, systems often assign a unique tuple id to each tuple in the original relation and attach this id to the projection of the tuple in each fragment. If we think of the original relation as containing an additional tupleid field that is a key, this field is added to each vertical fragment. Such decomposition is guaranteed to be losslessjoin. In short a relation can be (horizontally or vertically) fragmented, and each resulting fragment can be further fragmented.

Vidyala

nkar

T.E. Sem. V [CMPN] Advanced Database Management...

Documents

Transcript of T.E. Sem. V [CMPN] Advanced Database Management...