1 Distributed Databases BUAD/American University Distributed Databases.

19
1 Distributed Databases BUAD/American University Distributed Databases

Transcript of 1 Distributed Databases BUAD/American University Distributed Databases.

Page 1: 1 Distributed Databases BUAD/American University Distributed Databases.

1Distributed DatabasesBUAD/American University

Distributed Databases

Page 2: 1 Distributed Databases BUAD/American University Distributed Databases.

2Distributed DatabasesBUAD/American University

Definitions

• Distributed Database: A single logical database that is spread physically across computers in multiple locations (possibly global) that are connected by a data communications link.

• Decentralized Database: A collection of independent databases on non-networked computers. (possibly global)

Page 3: 1 Distributed Databases BUAD/American University Distributed Databases.

3Distributed DatabasesBUAD/American University

Reasons forDistributed Database

• Local business units want control over data.

• Consolidate data across local databases for integrated decision making.

• Reduce telecommunications costs.

• Reduce the risk of telecommunications failures.

Page 4: 1 Distributed Databases BUAD/American University Distributed Databases.

4Distributed DatabasesBUAD/American University

Distributed Database Options

• Homogeneous - Same DBMS at each node.

• Heterogeneous - Different DBMSs at different nodes.

• Systems - Supports some or all of the functionality of one logical database.

Page 5: 1 Distributed Databases BUAD/American University Distributed Databases.

5Distributed DatabasesBUAD/American University

Homogeneous, Non-Autonomous Database

• Data is distributed across all the nodes.

• Same DBMS at each node.

• All data is managed by the distributed DBMS (no exclusively local data.)

• All access is through one, global schema.

• The global schema is the union of all the local schema.

Page 6: 1 Distributed Databases BUAD/American University Distributed Databases.

6Distributed DatabasesBUAD/American University

Focus on The Following Heterogeneous Environment

• Data distributed across all the nodes.

• Different DBMSs may be used at each node.

• Local access is done using the local DBMS and schema.

• Remote access is done using the global schema.

Page 7: 1 Distributed Databases BUAD/American University Distributed Databases.

7Distributed DatabasesBUAD/American University

Objectives and Trade-offs

• Location Transparency - User does not have to know the location of the data.

• Local Autonomy - Local site can operate with its database when central site is down.

• Synchronous Distributed Database - All copies of the same data are always identical.

• Asynchronous Distributed Database - Some data inconsistency is tolerated.

Page 8: 1 Distributed Databases BUAD/American University Distributed Databases.

8Distributed DatabasesBUAD/American University

Advantages ofDistributed Database

• Increased reliability and availability.

• Local control over data.

• Modular growth.

• Lower communication costs.

• Faster response for certain queries.

Page 9: 1 Distributed Databases BUAD/American University Distributed Databases.

9Distributed DatabasesBUAD/American University

Disadvantages ofDistributed Database

• Software cost and complexity.

• Processing overhead.

• Data integrity exposure.

• Slower response for certain queries.

Page 10: 1 Distributed Databases BUAD/American University Distributed Databases.

10Distributed DatabasesBUAD/American University

Options forDistributing a Database

• Data replication.

• Horizontal partitioning.

• Vertical partitioning.

• Combinations of the above.

Page 11: 1 Distributed Databases BUAD/American University Distributed Databases.

11Distributed DatabasesBUAD/American University

Data Replication

• Advantages -– Reliability.

– Fast response.

– May avoid complicated distributed transaction integrity routines (if replicated data is refreshed at scheduled intervals.)

– De-couples nodes (transactions proceed even if some nodes are down.)

– Reduced network traffic at prime time (if updates can be delayed.)

Page 12: 1 Distributed Databases BUAD/American University Distributed Databases.

12Distributed DatabasesBUAD/American University

Data Replication

• Disadvantages -– Additional requirements for storage space.– Additional time for update operations.– Complexity and cost of updating.– Integrity exposure of getting incorrect data if

replicated data is not updated simultaneously.

• Therefore, better when used for non-volatile data.

Page 13: 1 Distributed Databases BUAD/American University Distributed Databases.

13Distributed DatabasesBUAD/American University

Types of Data Replication

• Snapshot Replication -

– Changes are periodically sent to a master site which sends an updated snapshot out to the other sites.

• Near Real-Time Replication -

– Broadcast update orders without requiring confirmation.

• Pull Replication -

– Each site controls when it wants updates.

Page 14: 1 Distributed Databases BUAD/American University Distributed Databases.

14Distributed DatabasesBUAD/American University

Issues in Data Replication Use

• Data timeliness.

• Useful if DBMS cannot reference data from more than one node.

• Batched updates can cause performance problems.

• Updates complicated with heterogeneous DBMSs or database design.

• Telecommunications speeds may limit mass updates.

Page 15: 1 Distributed Databases BUAD/American University Distributed Databases.

15Distributed DatabasesBUAD/American University

Horizontal Partitioning

• Different records of a file at different sites.

• Advantages -– Data stored close to where it is used.– Local access optimization.– Security.

• Disadvantages– Accessing data across partitions.– No data replication.

Page 16: 1 Distributed Databases BUAD/American University Distributed Databases.

16Distributed DatabasesBUAD/American University

Vertical Partitioning

• Different columns of a file at different sites.

• Advantages and disadvantages are the same as for horizontal partitioning except that combining data across partitions is more difficult because it requires joins.

Page 17: 1 Distributed Databases BUAD/American University Distributed Databases.

17Distributed DatabasesBUAD/American University

Five Distributed Database Organizations

Centralized database, distributed access.Replication with periodic snapshot update.Replication with near real-time

synchronization of updates.Partitioned, one logical database.Partitioned, independent, non-integrated

segments.

Page 18: 1 Distributed Databases BUAD/American University Distributed Databases.

18Distributed DatabasesBUAD/American University

Factors in Choice ofDistributed Strategy

• Funding, autonomy, security.

• Site data referencing patterns.

• Growth and expansion needs.

• Technological capabilities.

• Costs of managing complex technologies.

• Need for reliable service.

Page 19: 1 Distributed Databases BUAD/American University Distributed Databases.

19Distributed DatabasesBUAD/American University

Requirements for aDistributed DBMS

• Ability to locate data with a distributed data dictionary.

• Determine the location from which to retrieve data and the location at which to process each part of a distributed query.

• Heterogeneous DBMS translation.• Security, concurrency, query optimization, failure

recovery.• Consistency of replicated data.