1 Distributed Database Systems. 2 A Distributed Database on a Geographically Dispersed Network.

Post on 06-Jan-2018

241 views 0 download

description

3 A Distributed Database on a Local Network

Transcript of 1 Distributed Database Systems. 2 A Distributed Database on a Geographically Dispersed Network.

1

Distributed Database Systems

2

A Distributed Database on a Geographically Dispersed Network

3

A Distributed Database on a Local Network

4

A Multi-Processor System

5

Types of Accesses to a Distributed Database

6

Distributed Access Plan

1) At site 1Send sites 2 and 3 the supplier number SN

2) At sites 2 and 3Execute in parallel, upon receipt of the supplier number, the following program:

Find all PARTS records havingSUP # = SN;Send result to site 1

3) At Site 1Merge results from sites 2 and 3;Output the result.

7

8

Components of a Commercial DDBMS

9

Data DistributionProblem:

Choose a unit of the logical database to use for assignment to data modules.

Possibilities:Relations –Distribution issues will influence

logical database design.Columns –Distribution issues will

influence logical database design.Rows –Too many; Directories become too

large.Data Items -Too many; Directories become too

large.

10

Data Distribution

Fragments – Logically defined rectangular subsets of relationsRelation 1

Relation 2

Fragment 2

Fragment 3

Fragment 1

Fragment 1

Fragment 2

11

Data DistributionLogical definition of fragments -

Jones

35 32K

Salesman

Black AName Age $ Job-Title Supervis

orDept.

Fragment 1

Fragment 2 Fragment 3

$ > 30K

$ < 30K

12

Data DistributionDatamodules

F1

F2 F3 F1 F2

DM1

DM2

DM3

Personnel Inventory

Assignment of Fragments to Datamodules

13

Data Distribution

Advantages of fragments as units of distribution.

Very flexible in size and definition.Distribution choices are largely independent of logical design.

14

System Considerations

Reliable NetworkPipelining

Logical Data ItemsDatabase Operations: Read

WriteTransactions: Read Set

Write SetAtomic – “All or Nothing”

Effect

15

System Considerations (cont’d)Each site in the DDBMS has one or both of the following software modules:

Transaction Manager (TM)Data Manager (DM)

TM’sRead, Parse, and Optimize user queriesHandle all interface with the user

DM’sMaintain physical databasePerform actual reads and writes

16

System Considerations (cont’d)

TM

DMTM

TM DM

DMTransaction

TransactionTransaction

Transaction

Data

Data

Data

TM’s communication only with DM’sDM’s communication only with TM’s

17

Transaction ExecutionTransaction TM’s Action.

Begin Set up temporary workspace.

Read (X) Select a DM which stores X,Send a message to this DM requesting X,Place X in workspace.

Read (X) No Action necessaryX is already in workspace.

Write (X) Change the value of X.

Read (X) No action necessary.

End Send a pre-commit to each DM that stores a copy of X,Await acknowledgements,Send commit message

18

Optimal File Allocation In A Distributed Database System

Given a number of computers that process common information files, how can we:

allocate the files optimally so that the allocation yields minimum overall operating costs (storage and communication)?meet access time requirements for each file?not exceed the storage capacity of each computer?

Note: A File may be viewed as a segment.

19

System Parametersn Computersm Files

Size of each fileUsage distribution for each file at each computerFrequency of modification of each file at each computer during usageAccess time requirement for each file at each computer

Storage capacity of each computer.Cost of storage per unit file length per computer.Cost of transmission per unit file length per

second per pair of computers.

20

ModelCOSTS

Total Cost = Storage Costs + Transmission Costs

TC= CS + CT

Transmission Costs = Costs for Retrievals + Cost for Updates

CT = CTR + CTU

CONSTRAINTS

Each file must be stored in at least one computer.The storage capacity of each computer must not be exceeded.The probability of exceeding the required access time for each file must be less than a specified bound.

21

Mathematical Representation Model

22

23

24

25

26

27

Transmission Paths Between Each Pair of

Computers

28

29

Reliability ConstraintAssuming processors and channels each have identical

reliability,ap = availability of the processorac = availability of the channelrj = # of redundant copies of the jth fileAj = Availability of the jth fileAj= ap [1 - (1 - acap)rj

For example ap = 0.98, ac = 0.99, thenAj = 0.951 for rj = 1Aj = 0.979 for rj = 2

30

31

File Directory for Distributed Databases

32

To Other NodesTransaction Manager

Directory Manager

Database Manager

DDBMS

User Transactio

n

Database

Directory

Fragment

Overview of the Directory Manager

Legend

High-Level Request

Standard Database Call

Physical Access Call

Non-Local Request

33

Content of Directory

Global descriptionFragmentation descriptionAllocation descriptionMappings to local namesAccess method descriptionStatistics on the databaseConsistency information

34

Content of a Directory SystemPhysical (Static)Location (Site, Copy #, Disk, Page);

Creator;

Creation Date;

Version of the File Size;

Code Format;

Date of Last Update;

Logical (Dynamic)File Status (R, W)

Number of Backlog Jobs;

Site Availability;

Resource Requirement;Processing Cost;Communication Cost;Translation Cost;

Security(File, User, C);

C=Read/Write;

Read Only;

Write Only;

OperationCompression ratio (Logical Operation Query Data Value);

Query Access Optimizer;

Statistical Data Gathering;

Protocols

35

The Functional Objectives ofIntegrated Dictionary/Directory

To support the control of data resourcesMaintaining data independence, security, and integrity

To support applications developmentOffering standardized data definitions and usage characteristicsEstablished program entities, DDL

To provide independence of directory data elements

Different hardware and software environmentsChanges in these environments

36

Possible Data Types In IDDData names, definitions, formats and sizes.

Integrity constraints, authorization tables, and usage statistics for transaction management.Schemas and sub-schemas.

Description of standardized transactions and reports.

Characteristics of hardware, such as processors, lines, and terminals.

Description of users.

The IDD must support the maintenance of relationships between various entities such as:Associations between

Authorization tables and data,Users and transactionsReports

The IDD supplies version control

37

Entity EntityRelationship

Attribute Attribute Attribute

Attribute Attribute Attribute

Figure 1

38

Contains

Relationship Created 820708

Social Security Number

Entity Created 820114

Payroll Record

Maximum Length 400 Characters

Entity Created 820519

Comments Length9 Characters

Figure 2

39

Schema Model LevelTypical

Meta-Entity-Types

Schema LevelTypical

Entity-Types, Relationship-Types,and

Attribute-Types

DictionaryLevelTypical

Entities, Relationships, and Attributes

Entity-Type

Element

Record

Document

Social-Security-NumberAgency-Name

Employee RecordPayroll Record

Form 1040FIPS Guideline

Relationship-Type Record-Contains-Element

Payroll-Record-Contains-Employee-Name

Table 1

Length

CreatorAttribute-Type

9 Characters

ADP Division

40

Classes of Directory

Centralized Directory

Single Master DirectoryExtended Centralized DirectoryMultiple Master Directory

Local Directory

Distributed Directory

41

42

43

44

45

46

Causes For Directory UpdateChanging the description or structure of the user database.Moving user database entities from one node to another.Changing the description of a user or node.Changing a user view.Changing a network node’s status.

47

Specific Drawbacks with Globally Replicated Directories

1) Additional remote activity to maintain directory coherence.

2) Difficulty of posting directory changes to a down site.

3) Difficulty of integrating a new site.

4) Storage of directory entries where they are not referenced.

5) Blurred responsibility for maintaining the directory.

48

Performance Measure

Operating Cost/Unit Time = Communication Cost(Query+Update)

+Storage Cost + Code Translation Cost(Query+Update)

Response Time

49

Operating Cost for the Centralized Directory System

50

51

Cost Trade-offs of Directory SystemsAssume

Communication cost much greater than storage costNo Translation costAll computers have same directory update rate

Then the cost trade-off point is at directory update rate.

P(C,EC) = 2/(N – 1) P(C,D) = 2/(N – 1) P(L,D) = 1

52

53

Type

Centralized

Extended Centralized

Multiple Master

Distributed Master

Localized

Description

Single Master directory

Advantages

SimplicityEase of updateReduces transmission costs and delays

Reduces transmission costs and delaysFall-soft CharacteristicsFast Response

Simple update procedure

Disadvantages

Transmission costs and delays

Coordinating updates of local directoriesKnowledge of appended directories

Storage requirementsCoordinating update of redundant copies

Storage costsTransmission costs for updates to the directoryTransmission costs for non-local queries

Variation of the centralized case in which the directory information is permanently appended in the local node once it is obtained from the master directoryVariation of the centralized case in which redundant copies of the master directory existMaster at every node

Local directory at each node without replication

Directory Design Alternatives

54

Distributed Ingres Dictionary/Directory Contain Four Types of Data:

Relation name and location

Information for parsing queries(domain names, formats, etc.)

Performance information(number of tuples, storage structures, etc.)

Consistency information(protection, integrity constraints, etc. Does not include control data for concurrency control and synchronization)

55

SDD-1 Dictionary/DirectoryThe directory itself is defined and maintained like any other user data. It can be logically fragmented, distributed, and replicated across the distributed DBMS’s.A directory locator (a small highly static file of directory fragment locations) is kept at every site and is used by the TMs and DMs to plan and control transactions and to help ensure DB integrity and consistency across concurrent accesses of data elements.The transaction modules are capable of caching remotely accessed directory data for subsequent usage. This facility is provided on the presumption that DB operations will exhibit the locality-of-reference characteristic.

56

Vpatient : Patient ClassnameSSNagepatID{report}

PatientDB1nameSSNage

PatientDB2nameSSNpatID

PatReportDB2patIDreport

Note that a shaded box represents a real collection and an unshaded box represents a

virtual entity.Figure 17: Pictorial diagram showing usefulness of

keys.

57

name

sex

age

ssn

job

personDB1namesexagessn

personDB2namegenderssnjob

Figure 15: Pictorial diagram showing correspondence between virtual and real attributes.

Vperson : PersonClass

V person

People

Virtual Collection

Note that a shaded box represents a real collection and an unshaded box represents a

virtual entity.

Character_to_String

Character_to_StringLargePositiveInteger_to_String

58

Vretiree:retireClassnameincome

Vincome: incomeClassstockAmoun

tpension

financeDB1 name

stockAmount

financeDB2 name

pension

Note that a shaded box represents a real collection and an unshaded box represents a virtual entity.

Figure 18: Pictorial diagram for aggregation.

59

Vname: nameClassfirst

middle

last

personDB1

name

getfirstgetmiddle

getlast

Note that a shaded box represents a real collection and an unshaded box represents a

virtual entity.

Figure 19: Pictorial diagram of computed attribute.

60

Vretiree:retireClass

name

incom

e

financeDB1 name

stockAmount

financeDB2 name

pension

Note that a shaded box represents a real collection and an unshaded box represents a

virtual entity.

Figure 20: Pictorial diagram of computed attribute.

1

2

61

Vinsurance:insuranceClass

name

{insuranceAmoun

ts}

carInsuranceDB1carOwneramount

houseInsuranceDB2houseOnwe

ramount

Note that a shaded box represents a real collection and an unshaded box represents a

virtual entity.

Figure 21: Pictorial diagram showing grouping.

62

Vpatient : patientClass

name

{doctors}

Note that a shaded box represents a real collection and an unshaded box represents a

virtual entity.Figure 22: Pictorial diagram showing relationship.

Vdoctors : doctorClass

name

docID

salarypatientDB1

namesalary

patientDB1namedocID

patientDB2namephysician

patientDB1namedocID

(key)

(pointer)

relationship

63

VtreatedBy : treatedByClass

Note that a shaded box represents a real collection and an unshaded box represents a

virtual entity.Figure 23: Pictorial diagram showing a named

relationship.

Vpatient : PatientClass

.

.

.

patientDB1

namedocIDamountOwed

patientdoctoramountOwed

(key)(key)

Vdoctor : DoctorClass...

64

Note that a shaded box represents a real collection and an unshaded box represents a virtual entity.

VpersonPatient : personClassname

Vpatient : patientClasspatID

amount

VpersonDoctor : personClassname

Vdoctor : DoctorClassdocIDsalary

patientDB1name

SSNpayment

namedocIDsalary

doctorDB2

Figure 24: Pictorial diagram showing relationship.

Vpatientpatient

Vdoctordoctorperson

VpersonPatient

VpersonDoctor

Virtual collections

65

Note that a shaded box represents a real collection and an unshaded box represents a

virtual entity.Figure 30: Derivation of Virtual Entity Vconcept.

ConceptSemTypeconceptID

semTypeID

ConceptconceptIDtermIDstringTypestringIDstringVal

VconceptconceptIDsemType{termSet} Vterm

termID{stringSet}

VstringstringNamestringIDstringType

(key)

66

DsemType

IDnamedefinition{relatedTo}

DsemRelate

relNamesemNamestatus

SemTypeDef

IDnamedefinition

SemTypeRel

name1relname2status

Note that a shaded box represents a real collection and an unshaded box represents a

virtual entity.Figure 31: Derivation of Virtual Entity VsemType.