Chapter 17

58
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 17 Client-Server Processing, Parallel Database Processing, and Distributed Databases

description

Chapter 17. Client-Server Processing, Parallel Database Processing, and Distributed Databases. Outline. Overview Client-Server Database Architectures Parallel Database Architectures Architectures for Distributed Database Management Systems Transparency for Distributed Database Processing - PowerPoint PPT Presentation

Transcript of Chapter 17

McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved.

Chapter 17

Client-Server Processing, Parallel Database Processing, and

Distributed Databases

17-2

Outline Overview Client-Server Database Architectures Parallel Database Architectures Architectures for Distributed Database

Management Systems Transparency for Distributed Database

Processing Distributed Database Processing

17-3

Evolution of Distributed Processing and Distributed Data Need to share resources across a network Timesharing (1970s) Remote procedure calls (1980s) Client-server computing (1990s)

17-4

Timesharing Network

Database

Terminal

Terminal

Terminal

Mainframe computer

17-5

Simple Resource Sharing

Database

(a) Remote procedural call

(b) File sharing

Database

Procedural call

Return results

File request

File returned

17-6

Client-Server Processing

Database

Client

Client

Client

Server

17-7

Distributed processing and data

DatabaseDatabase

Client

Client

Client

Client

Server Server

17-8

Motivation for Client-Server Processing Flexibility: the ease of maintaining and

adapting a system Scalability: the ability to support scalable

growth of hardware and software capacity Interoperability: open standards that allow

two or more systems to exchange and use software and data

17-9

Motivation for Parallel Database Processing Scaleup: increased work that can be

accomplished Speedup: decrease in time to complete a

task Availability: increased accessibility of

system Highly available: little downtime Fault-tolerant: no downtime

17-10

Motivation for Distributed Data Data control: locate data to match an

organization’s structure Communication costs: locate data close to

data usage to lower communication cost and improve performance

Reliability: increase data availability by replicating data at more than one site

17-11

Summary of Distributed Processing and Data

Technology Advantages Disadvantages

Client-server processing

Flexibility, interoperability, scalability High complexity, high development cost, possible interoperability problems

Parallel database processing

Speedup, scaleup, availability, scalability for predictive performance improvements

Possible interoperability problems, high cost

Distributed databases

Local control of data, improved performance, reduced communication costs, increased reliability

High complexity, additional security concerns

17-12

Client-Server Database Architectures Client-Server Architecture is an

arrangement of components (clients and servers) among computers connected by a network.

A client-server architecture supports efficient processing of messages (requests for service) between clients and servers.

17-13

Design Issues

Division of processing: the allocation of tasks to clients and servers.

Process management: interoperability among clients and servers and efficiently processing messages between clients and servers.

• Middleware: software for process management

17-14

Tasks to Distribute Presentation: code to maintain the

graphical user interface Validation: code to ensure the consistency

of the database and user inputs Business logic: code to perform business

functions Workflow: code to ensure completion of

business processes Data access: code to extract data to

answer queries and modify a database

17-15

Middleware A software component that performs

process management. Allow clients and servers to exist on

different platforms. Allows servers to efficiently process

messages from a large number of clients. Often located on a dedicated computer.

17-16

Client-Server Computing with Middleware

Middleware

17-17

Types of Middleware Transaction-processing monitors: relieve the

operating system of managing database processes

Message-oriented middleware: maintain a queue of messages

Object-request brokers: provide a high level of interoperability and message intelligence

Data access middleware: provide a uniform interface to relational and non relational data using SQL

17-18

Two-Tier Architecture

Database

SQL statements

Query results

Database server

17-19

Two-Tier Architecture

A PC client and a database server interact directly to request and transfer data.

The PC client contains the user interface code.

The server contains the data access logic. The PC client and the server share the

validation and business logic.

17-20

Three-Tier Architecture (Middleware Server)

Database

SQL statements

Query Results

Middleware server Database server

17-21

Three-Tier Architecture (Application Server)

(b) Application serverDatabase

SQL statements

Query results

Database server Application server

17-22

Three-Tier Architecture To improve performance, the three-tier

architecture adds another server layer either by a middleware server or an application server.

• The additional server software can reside on a separate computer.

• Alternatively, the additional server software can be distributed between the database server and PC clients.

17-23

Multiple-Tier Architecture A client-server architecture with more than three

layers: a PC client, a backend database server, an intervening middleware server, and application servers.

Provides more flexibility on division of processing

The application servers perform business logic and manage specialized kinds of data such as images.

17-24

Multiple-Tier Architecture

Database

Application server

Application server

Middleware server Database server

17-25

Multiple-Tier Architecture with Web Server

Web serverMiddleware

serverwith listener

SQL

Results

SQL statements and formatting requirements

Database

Database server

Database request

HTML

Page request

HTML

17-26

Web Service Architecture Generalize multiple-tier architectures for

electronic business commerce Supports services provided/used by

automated agents Advantages

Deploy services faster Communicate services in standard formats Find services easier

17-27

Web Service Components

Registry database

Service requestor

Service provider

Service registry

Bind

Find Publish

Service description

(WSDL)

Service implementation

Service description

(WSDL)

Service description

(WSDL)

17-28

Web Service Standards

HTTP, FTP, TCP-IP Simple Object Access Protocol: XML

message sending Web Service Description Language

(WSDL) Universal Description, Discovery

Integration Web Services Flow Language

17-29

Parallel DBMS Uses a collection of resources

(processors, disks, and memory) to perform work in parallel

Divide work among resources to achieve desired performance (scaleup and speedup) and availability.

Uses high speed network, operating system, and storage system

Purchase decision involves more than parallel DBMS

17-30

Basic Architectures

M

P

M

N

P P...

... ...

P P P...

M M M

N

...

P P P...

M M

(a) SE (b) SD (c) SN

LegendP: processorM: memoryN: high-speed networkSE: shared everythingSD: shared diskSN: shared nothing

17-31

Clustering Architectures

M

N

...

P P P...

M M M

N

...

P P P...

M M

(a) Clustered disk (CD) (b) Clustered nothing (CN)

M

...

P P P...

M M M

...

P P P...

M M

17-32

Design Issues Load balancing: CN architecture most

sensitive Cache coherence: CD architecture

problem Interprocessor communication: CN

architecture most sensitive Application transparency: no knowledge

about parallelism

17-33

Oracle Real Application Clusters

SGA

Redo logs

LGWR

SGA

LegendLGWR: Log writer processDBWR: DB writer processGCS: Global cache serviceSGA: Shared global area

DBWR GCS

Cache fusion

Redo logs

DB files

GCS DBWR LGWR

Shared storage system

17-34

Oracle RAC Features Cache fusion to synchronize cache access Query optimizer intelligence Connection load balancing Automatic failover Comprehensive administration interface

17-35

IBM DB2 SPF

Coordinator

...

P P P...

M

Partition 1 Partition 2 Partition n

...

P P P...

M

...

P P P...

M...

17-36

IBM SPF Features Automatic or DBA determined partitioning Query optimizer intelligence High scalability Partitioned log parallelism

17-37

Distributed Database Architectures DBMSs need fundamental extensions. Underlying the extensions are a different

component architecture and a different schema architecture.

Component Architecture manages distributed database requests.

Schema Architecture provides additional layers of data description.

17-38

Global Requests

Customer-order data

Product data

Customer-order data

Product data

17-39

Component Architecture

DB

Site 1DDM

LDM

DDM

DDM

LDM

DB

GD

GD

Site 2

Site 3

GD

17-40

Schema Architecture I

Externalschema 1

Externalschema 2

Externalschema n...

Conceptualschema

Fragmentationschema

Allocationschema

Internalschema 1

Internalschema 2

Internalschema m...

m Sites

17-41

Schema Architecture II Global

externalschema 1

Globalexternal

schema 2

Globalexternal

schema n...

Globalconceptual

schema

Site 1 localmappingschema

Site 1 localschemas

(conceptual,internal,external) ...

m Sites

Site 2 localschemas

(conceptual,internal,external)

Site m localschemas

(conceptual,internal,external)

Site 2 localmappingschema

Site m localmappingschema

...

17-42

Distributed Database Transparency Transparency is related to data independence. With transparency, users can write queries with

no knowledge of the distribution, and distribution changes will not cause changes to existing queries and transactions.

Without transparency, users must reference some distribution details in queries and distribution changes can lead to changes in existing queries.

17-43

Motivating Example

CustNameCustCityCustStateCustZipCustRegion

CustNo

Customer

OrdDateOrdAmtCustNo

OrdNo

Order

OrdCity

ProdNoOrdNo

OrderLine

ProdNameProdColorProdPrice

ProdNo

Product

QOHWarehouseNoProdNo

StockNo

Inventory1

11

1

8

8

8 8

17-44

Fragments Based on the CustRegion Column

CREATE FRAGMENT Western-Customers AS SELECT * FROM Customer WHERE CustRegion = 'West'

CREATE FRAGMENT Western-Orders AS SELECT Order.* FROM Order, Customer WHERE Order.CustNo = Customer.CustNo AND CustRegion = 'West'

CREATE FRAGMENT Western-OrderLines AS SELECT OrderLine.* FROM Customer, OrderLine, Order WHERE OrderLine.OrdNo = Order.OrdNo AND Order.CustNo = Customer.CustNo AND CustRegion = 'West'

CREATE FRAGMENT Eastern-Customers AS SELECT * FROM Customer WHERE CustRegion = 'East'

CREATE FRAGMENT Eastern-Orders AS SELECT Order.* FROM Order, Customer WHERE Order.CustNo = Customer.CustNo AND CustRegion = 'East'

CREATE FRAGMENT Eastern-OrderLines AS SELECT OrderLine.* FROM Customer, OrderLine, Order WHERE OrderLine.OrdNo = Order.OrdNo AND Order.CustNo = Customer.CustNo AND CustRegion = 'East'

17-45

Fragments Based on the WareHouseNo Column

CREATE FRAGMENT Denver-Inventory AS SELECT * FROM Inventory WHERE WareHouseNo = 1

CREATE FRAGMENT Seattle-Inventory AS SELECT * FROM Inventory WHERE WareHouseNo = 2

17-46

Fragmentation Transparency Fragmentation transparency provides the

highest level of data independence. Users formulate queries and transactions

without knowledge of fragments (locations, or local formats).

If fragments change, queries and transactions are not affected.

17-47

Location Transparency Location transparency provides a lesser

level of data independence than fragmentation transparency.

Users need to reference fragments in formulating queries and transactions.

However, knowledge of locations and local formats is not necessary.

17-48

Local Mapping Transparency Local mapping transparency provides a

lesser level of data independence than location transparency.

Users need to reference fragments at sites in formulating queries and transactions.

However, knowledge of local formats is not necessary.

17-49

Oracle Distributed Databases

Homogeneous and heterogeneous distributed databases

Emphasis on site autonomy Provides local mapping transparency Each site is a separately managed

database.

17-50

Oracle Links

One way link from local to remote Support remote access to other users’

objects Necessary to have knowledge of remote

database objects Use synonyms and views with links to

reduce remote database knowledge

17-51

Distributed Database Processing Distributed data adds considerable

complexity to query processing and transaction processing.

Distributed database processing involves movement of data, remote processing, and site coordination.

Performance implications sometimes cannot be hidden.

17-52

Distributed Query Processing Involves both local (intra site) and global

(inter site) optimization. Multiple optimization objectives The weighting of communication costs

versus local processing costs depends on network characteristics.

There are many more possible access plans for a distributed query.

17-53

Distributed Transaction Processing Distributed DBMS provides concurrency

and recovery transparency. Independently operating sites must be

coordinated. New kinds of failures exist because of the

communication network. New protocols are necessary.

17-54

Distributed Concurrency Control The simplest scheme involves centralized

coordination. Centralized coordination involves the

fewest messages and the simplest deadlock detection.

The number of messages can be twice as much in distributed coordination.

Primary Copy Protocol is used to reduce overhead with locking multiple copies.

17-55

Centralized Coordination

Centralcoordinator

Subtransaction 1at Site x

......Subtransaction 2at Site y

Subtransaction nat Site z

Lockrequest

Lockstatus

17-56

Distributed Recovery Management

Distributed DBMSs must contend with failures of communication links and sites.

Detecting failures involves coordination among sites.

The recovery manager must ensure that different parts of a partitioned network act in unison.

The protocol for distributed recovery is the two phase commit protocol (2PC).

17-57

Voting and Decision Phases

Voting phase

Coordinator Participant

Write Begin-Commit to log.Send Ready messages.Wait for responses.

Force updates to disk.If no failure, Write Ready-Commit to log. Send Ready vote.Else send Abort vote.

If all sites vote ready before timeout, Write Global Commit record. Send Commit messages. Wait for Acknowledgments.Else send Abort messages.

Decision phase

Write Commit to log.Release locks.Send acknowledgment.

Wait for acknowledgments.Resend Commit messages if necessary.Write global end of transaction.

1

2

3

4

5

17-58

Summary Utilizing distributed processing and data can significantly

improve DBMS services but at the cost of new design challenges.

Client-server architectures provide alternatives among cost, complexity, and benefit levels.

Parallel database processing provides improved performance (speedup and scaleup) and availability.

Architectures for distributed DBMSs differ in the integration of the local databases and level of data independence.