by one computer

81
1 Sharing Enterprise Data

Transcript of by one computer

Page 1: by one computer

1

Sharing Enterprise

Data

Page 2: by one computer

2

Enterprise Database Processing Architectures

• Many organizations have a variety of database architectures. Enterprise database processing is concerned with the challenges associated with merging these different architectures into a single view of the organizational data.

Page 3: by one computer

3

Database Processing Architectures

• Teleprocessing Systems

• Client-Server Systems

• File-Sharing Systems

• Distributed Database Systems

Page 4: by one computer

4

Teleprocessing Systems

• All processing is done by one computer. Users may use dumb terminals to transmit information to the centralized computer.

Page 5: by one computer

5

Relationship of Programs in a Teleprocessing System

Page 6: by one computer

6

Client-Server Systems

• Client-Server processing is a form of cooperative computing. Client computers and servers, using a network, share the computing burden. DBMS functionality is provided by one computer, typically the server.

Page 7: by one computer

7

Client-Server Architecture

Page 8: by one computer

8

File-Sharing Systems

• Files are shared between servers and client computers. The server does not provide DBMS functionality.

Page 9: by one computer

9

File-Sharing Architecture

Page 10: by one computer

10

Distributed Database Systems

• Distributed Databases store portions of the database on multiple systems that are interconnected using a network. As such, no one system contains the entire database.

Page 11: by one computer

11

Distributed Database Architecture

Page 12: by one computer

12

• Two terms are common

– Partitioning– Replication

Page 13: by one computer

13

Types of Distributed Databases

• Non-partitioned,non-replicated

• Partitioned, non-replicated

• Non-partitioned, replicated

• Partitioned, replicated

Page 14: by one computer

14

Comparison of distributed database alternatives

• Parallelism

• independence

• flexibility

• availability

• cost/complexity

• difficulty of control

• security of risk

Page 15: by one computer

15

Downloading Data

• Data may be pulled from a server-based DBMS and downloaded to a client.

• Used for query and report

• cannot be updated

Page 16: by one computer

16

computer1

Downloaded

DB

computer2

Downloaded

DB

GatewayComputer

Mainframe

TP Terminal

TP Terminal

Teleprocessing

DB

Downloaded

DBLAN

Page 17: by one computer

17

Issues in Downloading Data

• Coordination

– downloaded data must conform to database constraint

– local updates must be coordinated with downloads

Page 18: by one computer

18

Issues in Downloading Data

• Consistency

– In general ,downloaded data should not be updated

– Applications need features to prevent updating

– users should be made aware of possible problems

Page 19: by one computer

19

Issues in Downloading Data

• Access Control

– Data may be replicated on many computers

– procedures to control data access are more complicated

Page 20: by one computer

20

Issues in Downloading Data

• Computer Crime

– Illegal copying is difficult to prevent

– Diskettes and illegal online access are easy to conceal

– Risk may prevent the development of downloaded data applications

Page 21: by one computer

21

What is OLAP?

• OLAP (oh-lap) is an on-line system that analyzes and presents data in a particular manner.

• enables analysts, managers and executives to gain insight

– fast, consistent, interactive access

– multidimensional view

Page 22: by one computer

22

Technical aspects• Your Express database may reside locally on

your PC,anywhere on your LAN,anywhere on

your company's intranet nor even anywhere

on the internet.

• Olap Table can be used with any

development environment supporting ActiveX

controls such as Microsoft Visual Basic,

Microsoft Visual C++, Borland Delphi, Borland

C++ Builder or Microsoft FrontPage.

Page 23: by one computer

23

OLAP

• The data categories are called axes or dimensions. This is termed an OLAP Cube.

• There are no limits on the number of axes. If a large number of axes are used, it is termed an OLAP Hypercube

Page 24: by one computer

24

Relational Source Data for an OLAP Cube

Page 25: by one computer

25

An OLAP Cube

Page 26: by one computer

26

Page 27: by one computer

27

Easy to Build

Page 28: by one computer

28

Easy Drill down

Page 29: by one computer

29

PUBLISHING

Page 30: by one computer

30

Easy to Rotate

Page 31: by one computer

31

Datawarehouse

• Downloading data -

– data moves closer to the user

– problems ??

• One or two can be managed ,but think about many

– solution

• datawarehouse

–makes data more useful

Page 32: by one computer

32

Data Warehouse

Page 33: by one computer

33

Data Warehouse

• A data warehouse is a store of enterprise data (and procedures) that is designed to facilitate management decision making

• A data warehouse includes data, tools, procedures, training, personnel, and other resources that are required or that make decision making easier

• The data comes from many different sources and may output to many different sources

Page 34: by one computer

34

Data Warehouse Components

Page 35: by one computer

35

Categories of Data Warehouse Requirements

Page 36: by one computer

36

Data Warehouse Challenges

• Inconsistent Data

• Tool Integration

• Missing Warehouse Data Management Tools

• Ad Hoc Nature of Requirements

Page 37: by one computer

37

Data Mart

• A data mart is a facility akin to a data warehouse but for a much smaller domain

• The goal of the data mart is to provide the functionality of a data warehouse within a limited domain

Page 38: by one computer

38

Datawarehouse

• Downloading data -

– data moves closer to the user

– problems ??

• One or two can be managed ,but think about many

– solution

• datawarehouse

–makes data more useful

Page 39: by one computer

39

Review from previous class

• Database architectures

– Teleprocessing

– client server

– fileserver

– distributed

Page 40: by one computer

40

Why OLAP not RDBMS??• RDMS is good for

– online transaction processing

– static reporting

• RDMS can’t handle

– due to the SQL limitation relational model is incapable of analytical solution

– sequential processing (ratios,percents,period -to-period comparisons)

– for reporting- can’t break rows,subtotals,numbering,rankings, or moving averages

Page 41: by one computer

41

Data downloading,Data Mart and

Data warehouse• Data Downloading

– smallest and easiest alternative

– downloaded data are provided on a regular and recurring basis

– less timing and domain inconsistencies

• Data Mart

– particular business function

– same type of user

• Data Warehouse

– expensive,more difficult

– provides data for recurring and ad-hoc basis

Page 42: by one computer

42

Data Administration

• Data is an critical and expensive to acquire resource to an organization. As such, careful administrative procedures and controls are required.

• Protect data,use effectively

Page 43: by one computer

43

Data Administration Challenges

Page 44: by one computer

44

Managing Multi-User Databases

• Serving the needs of multiple users and multiple applications adds complexity in…

– design,

– development, and

– migration (future updates)

Page 45: by one computer

45

Multi-User Database Issues include…

• Interdependency

– Changes required by one user may impact others

• Concurrency

– People or applications may try to update the same information at the same time

Page 46: by one computer

46

Multi-User Database Issues include… (continued)

• Record Retention

– When information should be discarded

• Backup/Recovery

– How to protect yourself from losing critical information

Page 47: by one computer

47

Common Multi-User DBMS

• Windows 2000– Access 2000

– SQL Server

– ORACLE

• UNIX– ORACLE

– Sybase

– Informix

Page 48: by one computer

48

Role of the Database Administrator

• Organizations typically hire a database administrator (DBA) to handle the issues and complexities associated with multi-user databases.

• A DBA facilitates the development and use of one or more databases.

Page 49: by one computer

49

Data Administrator versus Database Administrator

• Data Administrator

– Handle the database functions and responsibilities for the entire organization.

• Database Administrator (DBA)– Handle the

functions associated with a specific database, including those applications served by the database.

Page 50: by one computer

50

The Characteristics of a DBA

• Technical– The DBA is responsible for the

performance and maintenance of one or more databases.

• Diplomatic– The DBA must coordinate the efforts,

requirements, and sometimes conflicting goals of various user groups to develop community-wide solutions.

Page 51: by one computer

51

Technical Skills of the DBA

• Managing the database structure• Controlling concurrent processing• Managing processing rights and

responsibilities• Developing database security• Providing database recovery• Managing the database management system

(DBMS)• Maintaining the data repository

Page 52: by one computer

52

Managing the Database Structure

• Managing the database structure includes configuration control and documentation regarding:– The allocation of space– Table creation– Indices creation– Storage procedures– Trigger creation

Page 53: by one computer

53

Configuration Control

• The database configuration must reflect changes in organizational and user requirements

• Procedures and policies should be included

• Sometimes configuration changes have unanticipated consequences

• DBA must be prepared to debug and repair unforeseen issues.

Page 54: by one computer

54

The Need for Documentation

• When altering a databases structure, unanticipated issues are inevitable

• In recording the specific changes, dates, and times, it is easier to determine the root cause of issues and to resolve the issues

• When historical data is restored, it must be reformatted with all the changes in the database structure since the data was originally saved.

Page 55: by one computer

55

Documentation

• All structural changes must be carefully documented with the following:– Reason for change– Who made the changes – Specifically what was changed– How and when the changes were

implemented– How were the changes tested and what

were the results

Page 56: by one computer

56

Documentation Aids

• Version Control and Computer Assisted Software Engineering (CASE) toolsautomate and/or manage many tedious documentation tasks.

• Printing the data dictionaries after structural changes also helps eliminate many tedious documentation tasks

Page 57: by one computer

57

Controlling Concurrency Processing

• Measures are taken to prevent that one user’s actions do not adversely impact another user’s actions

• At the core of concurrency is accessibility. In one extreme, data becomes inaccessible once a user touches the data. This ensures that data that is being considered for update is not shown. In the other extreme, data is always readable. The data is even readable when it is locked for update.

Page 58: by one computer

58

Aspects of Concurrency Control

• Rollback/Commit: Ensuring all actions are successful before posting to the database

• Multitasking: Simultaneously serving multiple users

• Lost Updates: When one user’s action overwrites another user’s request

Page 59: by one computer

59

Rollback/Commit

• A database operation typically involves several transactions. These transactions are atomic and are sometimes called logical units of work (LUW).

• Before an operation is committed to the database, all LUWs must be successfully completed. If one or more LUW is unsuccessful, a rollback is performed and no changes are saved to the database.

Page 60: by one computer

60

Lost Update Problem

• If two or more users are attempting to update the same piece of data at the same time, it is possible that one update may overwrite another update.

• Resource locking scenarios are designed to address this problem

Page 61: by one computer

61

Resource Locking

• A resource lock prevents a user from reading and/or writing to a piece of data

• Locks may be applied to:

– a single data item (value)

– an entire row of a table

– a page (memory segment) (many rows worth)

– an entire table

– an entire database

• This is referred to as the Lock granularity

Page 62: by one computer

62

Types of Resource Locks

• Implicit versus Explicit

– Implicit locks are issued automatically by the DBMS based on an activity

– Explicit locks are issued by users requesting exclusive rights to the data

• Exclusive versus Shared

– An exclusive lock lock prevents others from reading or updating the data

– A shared lock allows others to read, but not update the data

Page 63: by one computer

63

Deadlocks

• As a transaction begins to lock resources, it may have to wait for a particular resource to be released by another transaction. On occasions, two transactions maybe indefinitely waiting on one another to release resources. This condition is known as a deadlock or a deadly embrace.

Page 64: by one computer

64

Avoiding Deadlocks

• Strategy 1:– Wait until all resources are available,

then lock them all before beginning• Strategy 2:

– Establish and use clear locking orders/sequences

• Strategy 3:– Once detected, the DBMS will rollback

one transaction

Page 65: by one computer

65

Resource Locking Strategies

• Optimistic Locking

– Read data

– Process transaction

– Issue update

– Look for conflict

– If conflict occurred, rollback and repeat or else commit

• Pessimistic Locking

– Lock required resources

– Read data

– Process transaction

– Issue update

– Release locks

Page 66: by one computer

66

Database Security

• Database security strives to ensure…

– Only authorized users perform authorized activities at authorized times

Page 67: by one computer

67

Managing Processing Rights and Responsibilities

• Processing rights define who is permitted to do what, when

• The individuals performing these activities have full responsibility for the implications of their actions

• Individuals are identified by a username and a password

Page 68: by one computer

68

Granting of Processing Rights

• Database users are known as an individual and as a member of one or more role

• Access and processing rights/privileges may be granted to an individual and/or a role

• Users possess the compilation of rights granted to the individual and all the roles for which they are members

Page 69: by one computer

69

Granting Privileges

Page 70: by one computer

70

Providing Database Recovery

• Common causes of database failures…– Hardware failures– Programming bugs– Human errors/mistakes

• Since these issues are impossible to completely avoid, recovery procedures are essential

Page 71: by one computer

71

Database Recovery Characteristics

• Continuing business operations (Fall-back procedures/Continuity planning)

• Restore from backup

• Replay database activities since backup was originally made

Page 72: by one computer

72

Fall-back Procedures/Continuity Planning

• The business will continue to operate even when the database is inaccessible

• The fall-back procedure defines how the organization will continue operations

• Careful attention must be paid to…– saving essential data– continuing to provide quality service

Page 73: by one computer

73

Restoring from Backup

• In the event that the system must be rebuilt or reloaded, the database is restored from the last full backup.

• Since it is inevitable that activities occurred since the last full backup was made, subsequent activities must be replayed/restored.

Page 74: by one computer

74

Recovery via ReprocessingThe database is periodically backed up (a database save) and all transactions applied since the last save are recorded

If the system crashes, the latest database save is restored and all of the transactions are re-applied (by users) to bring the database back up to the point just before the crash.

Several shortcomings:

» Time required to re-apply transactions

» Re-applying concurrent transactions is not straight forward.

Page 75: by one computer

75

Recovery via Rollback/Rollforward

We apply a similar technique: Make periodic saves of the database (time consuming operation). However, maintain a more intelligent log of the transactions that have been applied. This transaction log Includes before images and after images

Page 76: by one computer

76

Rollforward

Before Image: A copy of the table record (or page) of data before it was changed by the transaction.

After Image: A copy of the table record (or page) of data after it was changed by the transaction.

Page 77: by one computer

77

Rollback/RollforwardRollback: Undo any partially completed transactions (ones in progress when the crash occurred) by applying the before images to the database.

Rollforward: Redo the transactions by applying the after images to the database. This is done for transactions that were committed before the crash.

Page 78: by one computer

78

Database Recovery

Recovery process uses both rollback and rollforward to restore the database.

In the worst case, we would need to rollback to the last database save and then rollforward to the point just before the crash.

Note : Most database management systems provide a mechanism to record activities into a log file.

Page 79: by one computer

79

Managing the Database Management System (DBMS)

• In addition to controlling and maintaining the users and the data, the DBA must also maintain and monitor the DBMS itself.

– Performance statistics (performance tuning/optimizing)

– System and data integrity

– Establishing, configuring, and maintaining database features and utilities

Page 80: by one computer

80

Maintaining the Data Repository

• The data repository contains metadata. Metadata is data about data.

• The data repository specifies the name, type, size, format, structure, definitions, and relationships among the data. They also contain the details about applications, users, add-on products, etc.

Page 81: by one computer

81

Types of Data Repositories

• Active data repository

– The development and management tools automatically maintain and upkeep the metadata.

• Passive data repository

– People manually maintain and upkeep the metadata