1 Lecture 8 Distributed Data Bases: Replication and Fragmentation.

36
1 Lecture 8 Distributed Data Bases: Replication and Fragmentation

description

3 Strategies for Data Allocation (1) Centralised Single data base, users distributed across network. High communication costs: All data access by users over network; No local references. Storage costs: No duplication, so minimal. Low reliability and low availability: Failure of central site prevents access to entire data base system. Performance: Likely to be unsatisfactory.

Transcript of 1 Lecture 8 Distributed Data Bases: Replication and Fragmentation.

Page 1: 1 Lecture 8 Distributed Data Bases: Replication and Fragmentation.

1

Lecture 8

Distributed Data Bases: Replication and Fragmentation

Page 2: 1 Lecture 8 Distributed Data Bases: Replication and Fragmentation.

2

Overview Last lecture:

Saw difficulty in handling logical relationships between distributed information.

Potential solutions such as federated DDBMS. This week:

Look at an area where distributed data bases are extensively used: Replication.

For back-up – for improving reliability of service– e.g. mirror site.

Page 3: 1 Lecture 8 Distributed Data Bases: Replication and Fragmentation.

3

Strategies for Data Allocation (1) Centralised

Single data base, users distributed across network. High communication costs:

All data access by users over network; No local references.

Storage costs: No duplication, so minimal.

Low reliability and low availability: Failure of central site prevents access to entire data base

system. Performance:

Likely to be unsatisfactory.

Page 4: 1 Lecture 8 Distributed Data Bases: Replication and Fragmentation.

4

Strategies for Data Allocation (2) Fragmented

Data base distributed by fragments (disjoint views). Low communication costs:

Fragments located near their main users (if well designed). Storage costs:

No duplication, so minimal. Reliability and availability vary depending on failed site:

Failure of one part loses fragments situated there; Other fragments continue to be available.

Performance: Likely to be satisfactory – better than centralised, since less

network traffic.

Page 5: 1 Lecture 8 Distributed Data Bases: Replication and Fragmentation.

5

Strategies for Data Allocation (3) Complete Replication

Data base completely copied to each site. Communication costs:

Low for reads, high for updates– need to propagate updates through system.

High Storage costs: Complete duplication.

High reliability and high availability: Can switch from failed site to another.

Performance: Good for reads; Potentially poor for updates, because of propagation.

Page 6: 1 Lecture 8 Distributed Data Bases: Replication and Fragmentation.

6

Strategies for Data Allocation (4) Selective Replication

Fragments are selectively replicated. Communication costs:

Low (if well designed). Storage costs

Duplication of some fragments means that it is not minimal,but less than with complete replication.

Reliability and availability vary depending on failed site: Failure of one part loses fragments situated (only) there; Other fragments continue to be available.

Performance: Likely to be satisfactory – better than centralised, since less

network traffic.

Page 7: 1 Lecture 8 Distributed Data Bases: Replication and Fragmentation.

7

Fragmentation – Further Details

A fragment is a view on a table. Two main types:

Horizontal (classification by value) Sub-set of tuples obtained by RESRICT operation

(algebra) or WHERE clause (SQL). Vertical (classification by property)

Sub-set of columns obtained by PROJECT operation (algebra) or SELECT clause (SQL).

Page 8: 1 Lecture 8 Distributed Data Bases: Replication and Fragmentation.

8

Other Forms of Fragmentation

Mixed (classification by both value and property) Both horizontal and vertical fragmentation are used

to obtain a single fragment. Derived (association)

An expression such as a join connects the fragments. None

The whole of a table appears, unchanged, in a view.

Page 9: 1 Lecture 8 Distributed Data Bases: Replication and Fragmentation.

9

Why Fragment?

Most applications use only part of the data in a table.

To minimise network traffic, do not send more data to any site than is strictly necessary.

Data not required by an application is not visible to it, enhancing security.

Page 10: 1 Lecture 8 Distributed Data Bases: Replication and Fragmentation.

10

Factors against Fragmentation

Performance May be affected adversely by the need for

some applications to reconstruct fragments into larger units.

Integrity More difficult to control, with dependencies

possibly scattered across fragments.

Page 11: 1 Lecture 8 Distributed Data Bases: Replication and Fragmentation.

11

3 Rules for Fragmentation

Rule 1: Completeness If a table T is decomposed into fragments: Every value found in T must be contained in

at least one of the fragments– so no loss of data in fragmentation– otherwise we lose data.

Page 12: 1 Lecture 8 Distributed Data Bases: Replication and Fragmentation.

12

Rule 2: Reconstruction It must be possible to reconstruct T from the

fragments using a relational operation (typically a natural join)

– so functional dependencies are preserved– otherwise decomposition into fragments is lossy.

3 Rules for Fragmentation

Page 13: 1 Lecture 8 Distributed Data Bases: Replication and Fragmentation.

13

Rule 3: Disjointness A data item may not appear in more than one

fragment unless it is a component of a Primary Key.

This avoids duplication and potential inconsistency

– although transactions should avoid the latter. Primary Key duplication allows

reconstructions to be made.

3 Rules for Fragmentation

Page 14: 1 Lecture 8 Distributed Data Bases: Replication and Fragmentation.

14

Strategy for Designing a Partially Replicated Distributed Data Base – 1

Design a global data base using standard methodology.

Examine the regional distribution of the business. What data is needed by each part of the

business? Some data is only used locally (not exported, as in

a Federated DDBMS). Some data is mostly used locally.

Page 15: 1 Lecture 8 Distributed Data Bases: Replication and Fragmentation.

15

Transactions give many clues to ideal placement of fragments a transaction will perform slowly if it requires data

from different sites, unless the network connecting them is very fast.

a transaction performing much replication of updates will perform slowly if there is frequent contention for resources (locking).

frequently used transactions should be optimised; infrequently used ones can be ignored.

Strategy for Designing a Partially Replicated Distributed Data Base – 2

Page 16: 1 Lecture 8 Distributed Data Bases: Replication and Fragmentation.

16

Decide on which relations are not to be fragmented. They will normally be replicated everywhere:– for ease of updating and to maintain integrity.

Fragment remaining relations to suit: locality; transactions.

Strategy for Designing a Partially Replicated Distributed Data Base – 3

Page 17: 1 Lecture 8 Distributed Data Bases: Replication and Fragmentation.

17

Transparencies in DDBMS

Transparency hides low-level details (often details of implementation) from the user.

Four main types: Distribution; Transaction; Performance; DBMS.

Page 18: 1 Lecture 8 Distributed Data Bases: Replication and Fragmentation.

18

Distribution Transparency

The DDB is perceived by the user as a single, logical unit even though the data is: distributed over several sites; fragmented in various ways.

Page 19: 1 Lecture 8 Distributed Data Bases: Replication and Fragmentation.

19

Significance of Full Distribution Transparency

The highest form of Distribution Transparency is termed Fragmentation Transparency.

Users do not need to know anything about the distribution techniques.

Users address the global schema in queries. Users will not, however, understand why

some queries take longer than others.

Page 20: 1 Lecture 8 Distributed Data Bases: Replication and Fragmentation.

20

Reduced Forms of Distribution Transparency

Location Transparency Users need to know about fragmentation but

not about placements at sites. Users do not need to know what replications

exist. Local Mapping Transparency

The most limited transparency. Users need to know about fragmentation and

sites.

Page 21: 1 Lecture 8 Distributed Data Bases: Replication and Fragmentation.

21

Transaction Transparency

Ensures that all transactions maintain the DDB’s integrity and consistency.

Each transaction is divided into sub-transactions: one sub-transaction for each site; sub-transactions usually executed in parallel– hence gains in efficiency

More complicated than in centralised system.

Page 22: 1 Lecture 8 Distributed Data Bases: Replication and Fragmentation.

22

Forms of Transaction Transparency

Concurrency Transparency All concurrent transactions (centralised and

distributed) execute independently. The DDBMS must ensure that:

each sub-transaction is executed in the normal spirit of transactions (ACID);

the sub-transactions as a whole, forming one transaction, are executed ACID-style;

the mixture of sub-transactions and whole transactions is executed ACID-style.

Page 23: 1 Lecture 8 Distributed Data Bases: Replication and Fragmentation.

23

Transactions – Problems with Replication

Failure Transparency Users are unaware of problems encountered

during transaction execution, e.g. If, say, 6 copies of a data item (at 6 sites) need to

be updated: Problems if only 5 are currently reachable. Need to delay COMMIT until all sites processed

– otherwise inconsistent data– unless we allow a delayed asynchronous update.

Page 24: 1 Lecture 8 Distributed Data Bases: Replication and Fragmentation.

24

Performance Transparency

Requires the DDBMS to determine the most cost-effective way to handle a request: which fragment to use; (if replicated) which copy of a fragment to

use; which site to use;

Avoidance of any performance degradation compared with a centralised system.

Page 25: 1 Lecture 8 Distributed Data Bases: Replication and Fragmentation.

25

DBMS Transparency

Hides knowledge of which DBMS is being used.

The most difficult transparency of all– particularly with heterogeneous models.

See problems highlighted in Lecture 7: Global Schema Integration; Federated Data Bases; Multi-Data-Base Languages.

Page 26: 1 Lecture 8 Distributed Data Bases: Replication and Fragmentation.

26

Replication Servers

Copying and maintenance of data on multiple servers. Replication – the process of generating and

reproducing multiple copies of data at one or more sites.

Servers – provide the file resources – the distributed data base.

Page 27: 1 Lecture 8 Distributed Data Bases: Replication and Fragmentation.

27

Benefits of Replication

Increased reliability. Better data availability. Potential for better performance (with good

design). “Warm stand-by”

– as in mirror site, shadowing actions of main site and taking over if main site crashes.

Page 28: 1 Lecture 8 Distributed Data Bases: Replication and Fragmentation.

28

Timing of Replication

Synchronous Immediate, according to some common signal such

as time. Ideal as it ensures immediate consistency. Assumes availability of all sites.

Asynchronous Independently with delays ranging from a few

seconds to several days. Immediate consistency is not achieved. More flexible, since at any one time not all sites need

be available.

Page 29: 1 Lecture 8 Distributed Data Bases: Replication and Fragmentation.

29

Types of Data Replicated

Across heterogeneous data models: Mapping required (hard).

Object replication: More varied than just base tables. Also auxiliary structures such as indices. Stored procedures and functions.

Scalability: No volume restrictions.

Page 30: 1 Lecture 8 Distributed Data Bases: Replication and Fragmentation.

30

Replication Administration

Subscription Mechanism Allows a permitted user to subscribe to

replicated data / objects. Initialisation Mechanism

Allows for the initialisation of a target replication.

Page 31: 1 Lecture 8 Distributed Data Bases: Replication and Fragmentation.

31

Ownership of Replicated Data (1) Master / Slave

Master Site Primary owner of replicated data. Has sole right to change data. Publish and subscribe procedure. Asynchronous replication as slave sites receive

copies of the data. Slave site

Receive read-only data from master site. Slaves can be used as mobile clients.

Page 32: 1 Lecture 8 Distributed Data Bases: Replication and Fragmentation.

32

Work-Flow Ownership Flexible master designation. Dynamic ownership model. Right to update data moves along the chain of

command (replicating sites). E.g. as an order is processed the master right

moves to each department in turn.

Ownership of Replicated Data (2)

Page 33: 1 Lecture 8 Distributed Data Bases: Replication and Fragmentation.

33

Update Anywhere Peer-to-peer model. Multiple sites can update data. Conflict resolution required. More complex implementation.

Ownership of Replicated Data (3)

Page 34: 1 Lecture 8 Distributed Data Bases: Replication and Fragmentation.

34

Distribution and Replication in Oracle 9i

Materialised Views– formerly known as Snapshots. Views are updated by

Refresh mechanism; Variable frequency to suit application:

Fast – based on identified changes; Complete – replaces all existing data; Force – tries Fast – if not possible, does Complete.

Page 35: 1 Lecture 8 Distributed Data Bases: Replication and Fragmentation.

35

Oracle 9i Transparency

Supports Site (Location) Transparency.

Does not support Fragmentation Transparency.

Page 36: 1 Lecture 8 Distributed Data Bases: Replication and Fragmentation.

36

Summary of Distributed DBMS

An area under development to improve: Availability of data; Overall reliability of system; Performance (through good design).

However, disadvantages remain: Implementation can be complex (expensive). Heterogeneity in models is poorly handled.

Main use today is for replicating data.