Distributed Database Concept
-
Upload
fouziya-ansari -
Category
Documents
-
view
241 -
download
0
Transcript of Distributed Database Concept
-
7/21/2019 Distributed Database Concept
1/18
COMP 302 Valentina Tamma
Distributed Databases
Connolly & Begg. Chapter 22. Third edition
COMP 302 Valentina Tamma
Distributed Databases Basic Concepts
Concepts.
Advantages and disadvantages of distributeddatabases.
Functions and architecture for a DDBMS.
Distributed database design.
Levels of transparency.
Comparison criteria for Distributed DBMSs.
COMP 302 Valentina Tamma
Why distributed databases?
Some initial motivations:
The development of computer networks promotesdecentralization.
In a company, the database organization might reflect theorganizational structure, which is distributed into units.Each unit maintains its own database.
Sharing of data can be achieved by developing adistributed database system which:
makes data accessible by all units
stores data close to where it is most frequently used.
COMP 302 Valentina Tamma3
Concepts
Distributed Database
A logically interrelated collection of shared data (and adescription of this data), physically distributed over acomputer network.
Distributed DBMS (DDBMS)
Software system that permits the management of the
distributed database and makes the distributiontransparent to users.
-
7/21/2019 Distributed Database Concept
2/18
COMP 302 Valentina Tamma
An example of DDBMS
COMP 302 Valentina Tamma4
DDBMS - characteristics
Collection of logically-related shared data.
Data split into fragments.
Fragments may be replicated.
Fragments/replicas allocated to sites.
Sites linked by a communications network.
Data at each site is under control of a DBMS.
DBMSs handle local applications autonomously.
Each DBMS participates in at least one globalapplication.
COMP 302 Valentina Tamma6
These are not DDBMSs
Distributed Processing A centralized database that canbe accessed over a computer network.
COMP 302 Valentina Tamma7
These are not DDBMSs
Parallel DBMS
A DBMS running across multiple processors anddisks designed to execute operations in parallel,whenever possible, to improve performance.
Based on premise that single processor systems can nolonger meet requirements for cost-effective scalability,reliability, and performance.
Parallel DBMSs link multiple, smaller machines to achievesame throughput as single, larger machine, with greaterscalability and reliability.
-
7/21/2019 Distributed Database Concept
3/18
COMP 302 Valentina Tamma
Parallel DBMS
Main architectures for parallel DBMSs are:
Shared memory,
Shared disk,
Shared nothing.
COMP 302 Valentina Tamma
Parallel DBMS
(a) sharedmemory
(b) shared disk
(c) sharednothing
COMP 302 Valentina Tamma10
Advantages of DDBMSs
Reflects Organizational Structure
Improved Sharing and Local Autonomy
Improved AvailabilityA failure does not make the entire system inoperable
Improved ReliabilityData may be replicated
Improved PerformanceData are local to the site of greatest demand
Economics
Many small computers cost less than a big one! Modular Growth
easy to add new modules
COMP 302 Valentina Tamma11
Disadvantages of DDBMSs
Complexity
CostEspecially in system management
Securitynetwork must be made secure
Integrity Control More Difficult
Lack of Standards
Lack of Experience Database Design More Complex
due to fragmentation, allocation of fragments to a specificsite, ..
-
7/21/2019 Distributed Database Concept
4/18
COMP 302 Valentina Tamma12
Types of DDBMS
Homogeneous DDBMS
All sites use same DBMS product (eg.Oracle)
Fairly easy to design and manage.
Heterogeneous DDBMS
Sites may run different DBMS products (eg. Oracle andIngress)
Possibly different underlying data models (eg. relationalDB and OO database)
Occurs when sites have implemented their own databasesand integration is considered later.
We wont consider heterogeneous DDBMSs here.
COMP 302 Valentina Tamma
Multidatabase System (MDBS)
DDBMS in which each site maintains completeautonomy.
DBMS that resides transparently on top of existingdatabase and file systems and presents a singledatabase to its users.
Allows users to access and share data withoutrequiring physical database integration.
Unfederated MDBS (no local users) and federatedMDBS.
COMP 302 Valentina Tamma
Overview of Networking
Network - Interconnected collection of autonomouscomputers, capable of exchanging information.
Local Area Network (LAN) intended for connectingcomputers at same site.
Wide Area Network (WAN) used when computers or LANsneed to be connected over long distances.
WAN relatively slow and less reliable than LANs. DDBMS
using LAN provides much faster response time than oneusing WAN.
COMP 302 Valentina Tamma
Overview of Networking
-
7/21/2019 Distributed Database Concept
5/18
COMP 302 Valentina Tamma
Functions of a DDBMS
Expect DDBMS to have at least the functionality ofa DBMS (see Connolly & Begg. Chapter 2. Third edition)
Also to have following functionality: Extended communication services.
Extended Data Dictionary.
Distributed query processing.
Extended concurrency control.
Extended recovery services.
Extended security control.
COMP 302 Valentina Tamma
Reference Architecture for DDBMS
Due to diversity, no accepted architecture equivalent toANSI/SPARC 3-level architecture for DBMSs.
A possible reference architecture consists of: Set of global external schemas.
Global conceptual schema (GCS).
Fragmentation schema and allocation schema.
Set of schemas for each local DBMS conforming to 3-levelANSI/SPARC .
Some levels may be missing, depending on levels oftransparency supported.
COMP 302 Valentina Tamma
Reference Architecture for DDBMS
COMP 302 Valentina Tamma
Reference Architecture for DDBMS
Global Conceptual Schema is the logicaldescription of the DB as if it were not distributed. Itcontains definitions of entities, relationships,constraints, security, and integrity information.
Fragmentation and Allocation Schemas describehow data are logically partitioned, and where theyare located, taking replication into account.
Local Schemas are the logical descriptions of thelocal DBs.
-
7/21/2019 Distributed Database Concept
6/18
COMP 302 Valentina Tamma
Components of a DDBMS
COMP 302 Valentina Tamma
Distributed Databases
Issues in Distributed Database Design
COMP 302 Valentina Tamma27
Issues in Distributed Database Design
Three key issues we have to consider:
Data Allocation: where are data placed? Data should bestored at site with "optimal" distribution.
Fragmentation: relation may be divided into a number ofsub-relations (called fragments) , which are stored indifferent sites.
Replication: copy of fragment may be maintained atseveral sites.
COMP 302 Valentina Tamma
Issues in Distributed Database Design
Definition and allocation of fragments carried outstrategically to achieve:
Locality of Reference
Improved Reliability and Availability
Improved Performance
Balanced Storage Capacities and Costs
Minimal Communication Costs.
Involves analysing most important transactions,based on quantitative/qualitative information.
-
7/21/2019 Distributed Database Concept
7/18
COMP 302 Valentina Tamma
Fragmentation
Quantitative information may include:
frequency with which a transaction is run;
site from which a transaction is run; performance criteria for transactions.
Qualitative information may include transactions that areexecuted such as:
type of access (read or write);
predicates of read operations.
COMP 302 Valentina Tamma30
Data Allocation
Four strategies regarding placement of data:
Centralized
Partitioned (or Fragmented)
Complete Replication
Selective Replication
COMP 302 Valentina Tamma31
Data Allocation
Centralized: Consists of single database stored at one sitewith users distributed across the network.(This is not a DDB but distributed processing!!)
Partitioned: Database partitioned into disjoint fragments,each fragment assigned to one site.
Complete Replication: Consists of maintaining completecopy of database at each site.
Selective Replication:Combination of partitioning,replication, and centralization.
COMP 302 Valentina Tamma28
Fragmentation
A relation R is divided into fragments r1, r2, rn,which contain enough information to allowreconstruction of R
Example:We have a relation Sells(pub, address,price,type)
Type is bitter or lager.
We can split Sells into twp dfferent fragments: SellsBitter= type = bitter(Sells)
SellsLager= type = lager(Sells)
-
7/21/2019 Distributed Database Concept
8/18
-
7/21/2019 Distributed Database Concept
9/18
-
7/21/2019 Distributed Database Concept
10/18
COMP 302 Valentina Tamma45
Vertical Fragmentation
Each fragment consists of a subset of attributes of a relationR.
Defined using projection operation of relational algebra:
a1,an(R)
Determined by establishing affinityof one attribute to another.
Example:
Relation: Bars(name,address,licence,employees,owner)
Fragments:
name,address,licence (Bars)
name,address,employees,owner(Bars)
COMP 302 Valentina Tamma46
Mixed Fragmentation
We can also mix horizontal and vertical fragmentation.
We obtain a fragment that consist of an horizontalfragment that is vertically fragmented, or a verticalfragment that is horizontally fragmented.
Defined using Selection and Projection operations ofrelational algebra.
p(a1,an(R))
a1,an(p(R))
COMP 302 Valentina Tamma
Example - Mixed Fragmentation
S1 = staffNo, position, sex, DOB, salary(Staff)
S2 = staffNo, fName, lName, branchNo(Staff)
S21 = branchNo=B003(S2)
S22 = branchNo=B005(S2)
S23 = branchNo=B007(S2)
COMP 302 Valentina Tamma
Derived Horizontal Fragmentation
A horizontal fragment that is based onhorizontal fragmentation of a parent relation.
Ensures that fragments that are frequentlyjoined together are at same site.
Defined using Semijoin operation of relationalalgebra:
Ri = R>F Si, 1 i w
-
7/21/2019 Distributed Database Concept
11/18
-
7/21/2019 Distributed Database Concept
12/18
COMP 302 Valentina Tamma
Correctness of Fragmentation
Recostruction: we must be able to reconstruct the entire Rfrom fragments.
For horizontal fragmentation is union operation.
R = r1 r2 rn,
For vertical fragmentation is natural join operation.
R = r1>< r2>< r2
Disjointness: The two fragments are disjoint, except for the primarykey, name, which is necessary for reconstruction
-
7/21/2019 Distributed Database Concept
13/18
COMP 302 Valentina Tamma
Distributed Databases
Transparency in Distributed databases
COMP 302 Valentina Tamma
Transparencies in a DDBMS
Distribution Transparency
Transaction Transparency
Performance Transparency
DBMS Transparency
COMP 302 Valentina Tamma51
Distribution Transparency
The user has to perceive the DDB as a single,logical entity
Fragmentation Transparency: the user does not need toknow that data is fragmented
Location Transparency: the user does not need to knowthe location of data items
Replication Transparency: the user is unaware ofrelication of data.
Naming transparency: items in a database must have aunique name, but users dont need to worry about it.
COMP 302 Valentina Tamma54
Naming Transparency
Each item in a DDB must have a unique name.
DDBMS must ensure that no two sites create adatabase object with same name.
Solution 1: create central name server.
Disadvantages:
loss of some local autonomy;
central site may become a bottleneck;
low availability; if the central site fails, remaining sitescannot create any new objects.
-
7/21/2019 Distributed Database Concept
14/18
COMP 302 Valentina Tamma55
Naming Transparency
Solution 2: prefix object with identifier of site thatcreated it.
Example: Beer created at site S1 might be namedS1.Beer.
Disadvantage: loss of distribution transparency.
COMP 302 Valentina Tamma56
Naming Transparency
Solution 3: use aliases for each database object.
Example: S1.Beer might be known as local_Beerby user at site S1.
The DDBMS has task of mapping an alias toappropriate database object.
COMP 302 Valentina Tamma57
Transaction Transparency
Ensures that all distributed transactions maintaindistributed databases integrity and consistency.
Distributed transaction accesses data stored at more thanone location.
Each transaction is divided into number of sub-transactions, one for each site that has to be accessed.
DDBMS must ensure the indivisibility of both the globaltransaction and each sub-transactions.
Must ensure both concurrency transparency, and failuretransparency
COMP 302 Valentina Tamma58
Example - Distributed Transaction
Relation: Sells(pub, beer,price,type)
Fragments:
SellsBitter= type = bitter(Sells)
SellsLager= type = lager(Sells)
The two fragments are at two different sites.
Transaction T prints out the names of all pubs in the relation
sells. This transaction is split into two sub-transactions,one for each fragment.
-
7/21/2019 Distributed Database Concept
15/18
COMP 302 Valentina Tamma
Example - Distributed Transaction
T prints out names of all staff, using schema
defined above as S1, S2, S21, S22, and S23.Define three subtransactions TS3, TS5, and TS7to represent agents at sites 3, 5, and 7.
COMP 302 Valentina Tamma59
Concurrency Transparency
All transactions must execute independently and belogically consistent with results obtained if transactionsexecuted one at a time, in some arbitrary serial order.
Same fundamental principles as for centralised DBMS.
DDBMS must ensure both global and local transactions donot interfere with each other.
Similarly, DDBMS must ensure consistency of all sub-transactions of global transaction.
Techniques for concurrency control. Usually different fromthe ones for DBMS.
COMP 302 Valentina Tamma
Concurrency Transparency
Replication makes concurrency more complex.
If a copy of a replicated data item is updated,update must be propagated to all copies.
Could propagate changes as part of originaltransaction, making it an atomic operation.
However, if one site holding copy is not reachable,then transaction is delayed until site is reachable.
COMP 302 Valentina Tamma
Concurrency Transparency
Could limit update propagation to only those sitescurrently available. Remaining sites updated whenthey become available again.
Could allow updates to copies to happenasynchronously, sometime after the originalupdate. Delay in regaining consistency may rangefrom a few seconds to several hours.
-
7/21/2019 Distributed Database Concept
16/18
COMP 302 Valentina Tamma
Failure Transparency
DDBMS must ensure atomicity and durability ofglobal transaction.
Means ensuring that sub-transactions of globaltransaction either all commit or all abort.
Thus, DDBMS must synchronize globaltransaction to ensure that all sub-transactionshave completed successfully before recording afinal COMMIT for global transaction.
Must do this in presence of site and networkfailures.
COMP 302 Valentina Tamma63
Performance Transparency
DDBMS must perform as if it were acentralized DBMS:
DDBMS should not suffer any performancedegradation due to distributed architecture.
DDBMS should determine most cost-effectivestrategy to execute a request.
COMP 302 Valentina Tamma64
Performance Transparency
Distributed Query Processor (DQP) maps datarequest into ordered sequence of operations onlocal databases.
It must consider fragmentation, replication, andallocation schemas.
DQP has to decide:
which fragment to access;
which copy of a fragment to use; which location to use.
COMP 302 Valentina Tamma65
Performance Transparency
DQP produces execution strategy optimisedwith respect to some cost function.
Typically, costs associated with a distributedrequest include:
I/O cost;
CPU cost;
communication cost.
-
7/21/2019 Distributed Database Concept
17/18
COMP 302 Valentina Tamma66
Performance Transparency - Example
Property(Pno, City) 10000 records in London
Renter(Rno,Max_Price) 100000 records in Glasgow
Viewing(Pno, Rno) 1000000 records in London
SELECT p.pno
FROM property p INNER JOIN
(renter r INNER JOIN viewing v ON r.rno = v.rno)
ON p.pno = v.pno
WHERE p.city=Aberdeen AND r.max_price > 200000;
COMP 302 Valentina Tamma67
Performance Transparency - Example
Assume:
Each tuple in each relation is 100 characters long.
10 renters with maximum price greater than200,000.
100 000 viewings for properties in Aberdeen.
Computation time negligible compared tocommunication time.
COMP 302 Valentina Tamma68
Performance Transparency - Example
COMP 302 Valentina Tamma69
Dates 12 Rules for a DDBMS
0. Fundamental Principle
To the user, a distributed system should look exactly likea non-distributed system.
1. Local Autonomy
2. No Rel iance on a Centra l Site
3 . Con tinuous Operat ion
4 . Loca tion Independence
5. Fragmentation Independence
6. Replication Independence
-
7/21/2019 Distributed Database Concept
18/18
COMP 302 Valentina Tamma69
Dates 12 Rules for a DDBMS
0. Fundamental Principle
To the user, a distributed system should look exactly like a non-distributed system.
7. Dis tr ibuted Query Process ing
8. Distributed Transaction Processing
9. Hardware Independence
10. Operating System Independence
11. Network Independence
12. Database Independence
Note: last four rules are ideal!
COMP 302 Valentina Tamma4
Distributed Transaction Management
DDBMS must ensure:
synchronization of sub-transactions with other local
transactions executing concurrently at a site; synchronization of sub-transactions with global
transactions running simultaneously at same or differentsites.
Global transaction manager (transactioncoordinator) at each site, to coordinate global andlocal transactions initiated at that site.
COMP 302 Valentina Tamma
Distributed Concurrency Control
Techniques for Distributed Concurrency Controlmust ensure distributed serializability.
Locking protocols (extensions of 2PL protocol)Distributed Deadlock management.
Timestamping methods (extend the definition oftimestamp so that it includes a site identifier)