Consistent and Efficient Database Replication based on Group Communication Bettina Kemme School of...
-
Upload
clifford-lamb -
Category
Documents
-
view
215 -
download
0
Transcript of Consistent and Efficient Database Replication based on Group Communication Bettina Kemme School of...
Consistent and Efficient Database Replication
based on Group Communication
Bettina KemmeSchool of Computer ScienceMcGill University, Montreal
2Database Replication, Bettina Kemme, Feb. 2001
Outline
State of the Art in Database Replication
Key Points of our Approach Overview of the Algorithms Implementation Issues Performance ResultsOngoing Work
3Database Replication, Bettina Kemme, Feb. 2001
Replication -- Why?
Fault-Tolerance: Take-Over
Scale-Up: cluster instead of bigger mainframe
4Database Replication, Bettina Kemme, Feb. 2001
Replica Control: Research and Reality
Eager
Lazy
Primary Copy
UpdateEverywhere
Globally correctToo expensive
Deadlocks
Inconsistent reads
Configuration Restrictions
Feasible
Inconsistent readsReconciliation
Feasible in someapplications
UPDATEWHEN
UPDATEWHERE
Quorum/ROWAOracle Synchr.
Repl
Placement Strat.Sybase/IBM/Oracle
Placement Strategies
Weak ConsistencySybase/IBM/Oracle
5Database Replication, Bettina Kemme, Feb. 2001
Requirements Develop and apply appropriate techniques
in order to avoid current limitations of eager update everywhere approaches Keep flexibility of update everywhere
no restrictions on type of transactions and where to execute them
Consistency and fault-tolerance of eager replication
Good Performance response time + throughput
Straightforward, Implementable Solution Easy integration in existing systems
6Database Replication, Bettina Kemme, Feb. 2001
Response Time and Message Overhead
Goal: Reduce number of messages per transaction reduce response time reduce message overhead
Solution local execution of transaction bundle writes and send them in a single
message at the end of the transaction (as done in lazy schemes)
transaction with local execution and write set propagation at the end
transaction with individualremote writes
WriteReadcentral transaction
7Database Replication, Bettina Kemme, Feb. 2001
Ordering Transactions Before: uncoordinated
message delivery; danger of deadlocks
Now: pre-order txns by using total order multicast of Group Communication
Before: 2-phase-commit
Now: Independent execution at the different sites
Total Order
8Database Replication, Bettina Kemme, Feb. 2001
Group Communication Systems
Group Communication: Multicast Delivery order (FIFO, causal, total, etc.) Reliable delivery: on all nodes vs. on all
available nodes Membership control ISIS, Totem, Transis, Phoenix, Horus,
Ensemble, ... Goal: Exploit rich semantics of group
communication on a low level of the software hierarchy
9Database Replication, Bettina Kemme, Feb. 2001
Replica Control based on 2Phase LockingNode 1
Local Transaction
Write Phase commitcommit
write setwrite set
Node 2Remote Transaction
Node 3Remote Transaction
Lock Phase
Local Phase
Commit
Send Phase
Phase
Write Phase
Lock Phase
Transaction is first performed locally at a single site. Writes are sent in one message to all sites at the end
of transaction. Write messages are totally ordered. Serialization order obeys total order.
Local Phase
Send PhaseWS
10Database Replication, Bettina Kemme, Feb. 2001
Concurrency Control One possible Solution: Given a transaction T
Local Phase: T acquires standard local read and write locks Send Phase: Send write set using total order multicast Upon reception of write set of T on local node
Commit Phase: multicast commit message Upon reception of write set of T on remote node
Lock Phase: request all write locks in a single step; if there is a local transaction T’ with conflicting lock and T’ is still in local phase or send phase, abort T’. If T’ in send phase, multicast abort
Write Phase: apply updates of T Upon reception of commit/abort message of T on remote node,
terminate T accordingly For two transactions of same node: 2 phase locking. For concurrent transactions of different nodes:
optimistic scheme with early conflict detection: when write set of one transaction is delivered the conflict is detected.
Adjustment to other concurrency control schemes possible
11Database Replication, Bettina Kemme, Feb. 2001
Message Delivery Guarantees Uniform-Reliable Delivery
If a site delivers a message, all non-faulty sites deliver the message
Correctness on all sites (faulty or non-faulty): when a transaction commits at any site then it commits at all non-faulty sites
High message delay Reliable Delivery
If a non-faulty site (non-faulty for a sufficiently long time) delivers a message it is delivered at all non-faulty sites.
Correctness in the failure-free case In case of failures:
all non-faulty sites commit the same set of transactions a transaction might be committed at a faulty site (shortly
before failure) and it is not committed at the other sites. Low message delay
12Database Replication, Bettina Kemme, Feb. 2001
A Suite of Replication Protocols
Uniform Reliable Reliable
Serializability
Cursor Stability
Snapshot Isolation
Hybrid
SER-UR
CS-UR
SI-UR
HYB-UR
SER - R
CS - R
SI - R
HYB - R
The solutions provideflexibilityaccepted correctness criteria
13Database Replication, Bettina Kemme, Feb. 2001
Implementation
Integration of our replica control approach into the database system PostgreSQL
Purpose - We wanted to answer the following questions Can the abstract protocols really be mapped to
concrete algorithms in a relational database? How difficult is it to integrate the replication tool
into a real database system? What is the performance in a real cluster
environment?
14Database Replication, Bettina Kemme, Feb. 2001
Architecture of Postgres-R
GroupCommunication:
Ensemble
ClientClient Client
Client
Client
Postmaster
Local Txn Local Txn
Replication Mgr
Remote Txn
Server Server
Server
originalPostgreSQL
Remote Txn
Remote Txn
15Database Replication, Bettina Kemme, Feb. 2001
Write Set Messages
UPDATE employeeSET salary = salary+100WHERE salary < 2000
Parser/Optimizer
Executor
SQL-Statement
Set of physical records
Send SQL statements and reexecute at all sites Simple Small message size High execution overhead
on all site Problem with locks for
implicit reads in statement Send physical updates
and only apply changes Opposite characteristics
than sending statements
16Database Replication, Bettina Kemme, Feb. 2001
Gain of sending and applying physical changes
Scaleup for different remote update costs and update rate of 1
0
2
4
6
8
1 5 10 15 20
Number of Nodes
wo:0.1wo:0.2wo:0.5wo:1
Sca
leup
Scaleup numberOfNodes
1 updateRate remoteUpdateCostlocalUpdateCost
(numberOfNodes 1)
17Database Replication, Bettina Kemme, Feb. 2001
Comparison with standard distr. locking
1 2 3 4 50
200
400
600
800Traditional
Number Servers
Res
pons
e Ti
me
in m
s
1 2 3 4 50
50
100
150
200Postgres- R
Number Servers
Res
pons
e Ti
me
in m
s
Workload: 10 transactions per second 5 concurrent clients (each submitting a
transaction each 500 milliseconds)
For all experiments: Database: 10 relations a 1000 tuples Transactions: 10 updates per transaction
18Database Replication, Bettina Kemme, Feb. 2001
Response Time vs. Throughput
19Database Replication, Bettina Kemme, Feb. 2001
Scalability with fixed workload
Workload: 1 update transaction per second per server 14 queries per second per server
3 clients per server
Postgres-R
0
50
100
150
200
1 server/15tps
5 server/75tps
10 server/150tps
15 server/225tps
Res
po
nse
Tim
e in
ms
Write TxnRead Txn
20Database Replication, Bettina Kemme, Feb. 2001
Differently loaded Nodes Nodes: 10 nodes 15 clients in total
21Database Replication, Bettina Kemme, Feb. 2001
Conclusions Eager, update everywhere replication is
feasible (at least in clusters) by using adequate techniques As few messages as possible within transaction
boundaries As few synchronization points as possible Complete transaction execution only at one site Simple to adjust to existing concurrency control
mechanisms
22Database Replication, Bettina Kemme, Feb. 2001
Current Work Recovery under various failure models Development of a middleware replication
tool Development of group communication
protocols that better support the needs of the database system Ordering semantics Failure models
Building a complete system Adding other replica control protocols System administration Partial replication functionality