Unit 1 DD Overview
Transcript of Unit 1 DD Overview
Distributed Databases :An Overview
Unit-1
ContentsUNIT – IChapter.1
1.0 What is a Distributed Database [ DDB]1.1 Features of Distributed versus Centralized Databases,
Chapter 3. Levels Of Distribution Transparency, 3.1 Reference Architecture for Distributed Databases , 3.2 Types of Data Fragmentation, 3.6 Integrity Constraints in Distributed Databases.
Book-1 : Distributed Databases, by Stefano Ceri, Giuseppe Pelagatti, Tata McGraw-Hill edn 20081.1; 3.1, 3.2, 3.6
1.1 Features of Distributed versus Centralized Databases
What is a Distributed Database [DDB]?A simple definition:
A collection of data which belong to the same enterprise spread over the sites of a computer network.
The two important aspect of a DDB are:Distribution – [ of data]
In a centralized database data is at a single site [ host]
Logical Correlation – how exactly the data at different site are related.
Illustration of DDB through example:
Different Scenarios of BD applications
Personal Computer• One DB application • one computer
• One/more application(s) on a single computer with multiple [dumb] terminals / users
Different Scenarios of BD applications
• Multiple networked computers each with its own DB local application and local users
Different Scenarios of BD applications
• Multiple networked computers each with its own DB local DB and local users with a global application accessing data from these sites
Different Scenarios of BD applications
• Multiple networked computers each with its own local DB and local users with multiple global applications, each accessing data from these multiple sites
Different Scenarios of BD applications
Example.1A bank with 3 branches at different
locations. At each branch, a computer controls the teller terminals of the branch and the account database of the branch.
Each branch with its local database constitutes one site of the distributed database.
Computers are connected by a communication network
each site handles only local applications – operations requested from a terminal to access the db of that branch.
Does logical correlation property hold here?Should this be considered as an example of a DDB or a set of local DBs?
A global application – eg. An application that transfers funds from one site to another- is the one that make a DDB.
Example.2Same the previous example 1Now the computers and their
respective DBs have been moved form the branches to a common building and are connected with a high-bandwidth local network.
Tellers are connected to their respective computers by telephone lines
Each processor and its DB constitute a site for the local computer network.Should this be considered as an
example of a DDB or a set of local DBs?
Fig 1.2
Same as example 1 except for the geographical distribution of the computers
What are the major differences between the two from the view point of functioning and performance?
Example.3• Here the data of the different
branches are distributed on three “backend” computers, which perform the DBMS functions.
• The application programs are executed by a different computer [front-end] , which requests database access services from the backends when necessary.
Computer Center
Fig 1.3 A multiprocessor System
Should this be considered as an example of a DDB or a set of local DBs?NO. though the data is distributed, their distribution is not relevant to the application point of view. What is missing here is the local application.
1.1 Features of Distributed versus Centralized Databases
From the examples we can have the following working definition of a Distributed Database [DDB].A DDB is an integrated database which is built on top of a
computer network rather than on a single computer. The data which constitute the database are stored at the different sites of the computer network, and the application programs which are run by the computer access data at different sites.
13
Taxonomy of DDS
14
Homogeneous Distributed Databases
In a homogeneous distributed databaseAll sites have identical software Are aware of each other and agree to cooperate
in processing user requests.Each site surrenders part of its autonomy in
terms of right to change schemas or softwareAppears to user as a single system
15
Architecture of Homogeneous DDBMS
16
Schema Architecture of a Homogenous DDBMS
17
Hetrogeneous Distributed Databases
In a heterogeneous distributed databaseDifferent sites may use different schemas and software
Difference in schema is a major problem for query processing
Difference in software is a major problem for transaction processing
Sites may not be aware of each other and may provide only limited facilities for cooperation in transaction processing
18
Overall Architecture of multidatabase Systems
19
1. Distributed Database System
• Tightly Coupled• Loosely Coupled
20
Schema Architecture of Tightly-Coupled MDBS
• Advantages of Replication– Availability: failure of site containing relation r does not
result in unavailability of r is replicas exist.– Parallelism: queries on r may be processed by several
nodes in parallel.– Reduced data transfer: relation r is available locally at
each site containing a replica of r.
– ri = Ri (r)
21
1. Distributed Database System
• Loosely Coupled• A distributed database system consists of
loosely coupled sites that share no physical component
• Database systems that run on each site are independent of each other
• Transactions may access data at one or more sites
22
Loosely Coupled MDBS with Export Schema
23
Loosely Coupled MDBS with No Export Schema
DBS Architectures
DBS-Architecture
Features of a centralized Vs DDBs
centralized Vs DDBsReview:
What is a centralized DB? Traditional databases
What is a DDBs?
Features that characterize a Centralized DBCentralized ControlData independenceReduction of redundancyComplex physical structures and efficient access Integrity, Recovery and Concurrency ControlPrivacy and Security
centralized Vs DDBsCentralized Control
CDB One point control of the entire DB Single Database Administrator [DBA]
DDB Multi point (source) control Global Database Administrator [GDBA] & Local Database Administrator [LDBA] & “Site /Local Autonomy”- decides freedom of local
administrator
centralized Vs DDBsData Independence
What is data Independence? Organization of data (physical storage of data in a DB) is
transparent to the application developer How is it achieved?
Layered design/ Levels of Abstraction– Logical Level [Conceptual design- schemas, tuples, attributes]– Physical Level [ how data is stored in the hard disc]
Benefit Application developers need not know how data is
stored in the database stored In CDB
Allows the two layers to be designed independently How does this help? Each can be designed /changed
independent of the other.
centralized Vs DDBsData Independence …. Contd…
In DDB Also proves data independence, with an additional
feature called Distribution Transparency –Application programmers
not only need to know – How data is stored, and also– On which site it is stored.
Thus we have here in addition to traditional– Conceptual Schema– Storage Schema, we have– External Schema
centralized Vs DDBsRedundancy Reduction
In CDB Redundancy repetition of data Reduced as much as possible for TWO reasons:
– To avoid inconsistencies– To minimized the storage required
It is one of the main concerns – Normalization used
In DDB Redundancy is allowed ………….
centralized Vs DDBsRedundancy Reduction … contd..
In DDB Redundancy is allowed Reasons
– Faster access [ local data can be accessed faster]» Higher throughput» Higher availability» More fault tolerant
Makes design, development and data modification complex .
centralized Vs DDBsComplex Physical Structure & Efficient Access
In CDB Uses indexing, hashing, interfile chains and so on Purpose – faster / efficient access
In DDB Complex structures alone can not solve access
problems Efficient access is still an issue Complex structures at local level alone [local
optimization] are not enough. The network delays dominate the disc access delays.
A global optimization is necessary and it includes local optimization plan + an additional “network access plan”
centralized Vs DDBsIntegrity, recovery & concurrency Control
In CDB Integrity- requires enforcing ACID properties Integrity in Concurrency environmentConcurrency control
Various Protocols : two-phase, time-stamp, tree- ..etc.,Recovery
Log based approach, checkpointing etc. In DDB
All these are enforced Distribution of data make these protocols more
complex.
centralized Vs DDBsPrivacy & Security
In CDB DBA ensures authorized access to data Also requires additional specialized control
DDB Has similar problem, in addition to threats over the
network Local autonomy helps the local DBA to enforce
security Additional security measures are required for global /
overt the network threats.
Why DDBS?Organizational & economic reasons Interconnection of existing DBs Incremental growthReduced communication overheadPerformance considerationsReliability & availabilityAll these problems are not new. Why then the
development of DDBSs has taken this long? First, development of inexpensive, powerful small computers Second, for want of necessary network, middleware & DB-
technologies
DDBMSDistributed Database Management Systems
They support the creation & maintenance of DDBSsThey contain additional components which extend the
capabilities of CDBMSs. The typical such software components are: The database management component (DB) Data communication component (DC)– ODBC, JDBC,
TCP/IP The data dictionary (DD)– to include information
about the distribution of data over the network – fragmentation schema & allocation schema
The distributed database component (DDB)
components of a commercial DDBMS
DCDCDB
DD DD DDBDDB
Local database-2
DCDCDB
DD DD DDBDDB
Local database-1
Site 2
Site 1
components of a commercial DDBMSServices supported by the above systems
Remote database access by an application: RPC, ODBC, JDBC, TCP/IP, Named-pipes
Some degree of distribution transparencySupport for database administration & controlSome support for concurrency control
Assignment -1
1. List out all the key words introduced in this chapter and write a brief definition/explanation for each of them.
2. Selected any TWO commercial DBMS of your choice and describe the salient features of them as DDBMS.
DUE: next week the same hour.Questions:1. What are the different types of DDBS? Explain them
briefly2. What are the major differences between CDB & DDB?
Exsplain.
Seminars Sai sandeepShekun Bee IndexingRamya KrishnaSwathi GSwathi CSameeraRajeswriSharon SamuelSri RamyaSravanthi
Seminars
Naga subramanyamGiridharSyed AbdullaBhaskar Aunusha-1Najma KanamAmruthaAnusha-2
JAI SAI RAM