C-Store: Concurrency Control and Recovery Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY...

C-Store: Concurrency Control and Recovery

Jianlin FengSchool of SoftwareSUN YAT-SEN UNIVERSITYJun. 5, 2009

Concurrency Control vs. Recovery Concurrency Control

Provide correct control of concurrent running of multiple transactions to maximize system throughput. i.e., the average number of transactions completed in a

given time.

Recovery Ensures database is fault tolerant, and not corrupted b

y software, system or media failure

Concurrency Control in C-Store Uses strict two-phase locking to control concurrent

running of read-write transactions. each node (a site in the shared-nothing system

architecture) sets locks on data objects that the runtime system reads or writes.

Resolves deadlocks via timeouts. aborting one of the deadlocked transactions.

Does not use strict two-phase distributed commit. avoiding the PREPARE phase.

Strict Two-Phase Locking (Strict 2PL) It is the most widely used locking protocol.

Two rules(1) If a transaction T wants to read (respectively,

modify) a database object, it first requires a shared (respectively, exclusive) lock on the object.

(2) All locks held by a transaction are released when the transaction is completed.

Distributed COMMIT Processing in C-store (1): Master and Worker Each transaction T has a master that is

responsible for assigning T ’s sub-transactions to appropriate

nodes (workers). and determining the ultimate commit state of T.

Distributed COMMIT Processing in C-store (2): The Protocol 1st Phase

When the master receives a COMMIT statement for the transaction T, it waits until all workers have completed all outstanding actions

And then issues a commit (or abort) message to each worker.

2nd Phase Once a worker has received a commit message, it can

releases all locks related to the transaction T And delete the UNDO log for T.

T is completed, and hence has no need for UNDO in recovery.

Distributed COMMIT Processing in C-store (3): The Implications In C-Store, the master does not PREPARE the

workers.

So it is possible for a worker the master has told to commit to crash before writing any updates or log records related to a transaction to stable storage.

The failed worker will recover its state from other projections on other nodes during recovery.

Overview of Recovery in C-Store Uses standard write-ahead logging protocol for recovery.

Uses a STEAL, NO-FORCE policy for writing database objects. Possibly results in UNDO and REDO.

Only logs UNDO records.

Performs REDO by executing updates which have been queued on other nodes.

Write-Ahead Logging Property(WAL) The Protocol

Each write must be recorded in the log (on disk) before the corresponding change is reflected in the database itself.

To ensure this protocol, the DBMS must be able to selectively force a page in memory to disk. i.e., the page containing information on the write.

Contents of an Update Log Record <prevLSN, transID, type, pageID, length, offs

et, before-image, after-image> The first 3 fields are common to all log records. The other fields are for updates.

STEAL / NO-FORCE

STEAL Allowing an updated page P of an uncommitted transaction

T to be swapped from memory to disk. T can abort later, so the DBMS must remember the old

value of P to support UNDO. NO-FORCE

When a transaction T commits, pages in the buffer that are modified by T are not forced to disk.

System can crash before all the pages are written to disk, so the DBMS must remember the updates of T to support REDO.

the Recovery Algorithm ARIES: three phases1. Analysis: Identifies dirty pages in the buffer

(i.e., changes that have not been written to disk) and active transactions at the time of the crash.

2. REDO: Repeats all actions and restores the database state to what it was at the time of the crash.

3. UNDO: Undoes the actions of aborted transactions.

Recovery in C-Store

Basic idea A crashed node recovers by running a query (copying

state) from other projections.

K-Safety Sufficient projections and join indexes are maintained, So that K nodes can fail within time t, the time to recover, And the system will be able to maintain transactional

consistency.

Three cases to consider.

Recovery: Case 1

If the failed node suffered no data loss, No dirty pages are found for aborted transactions.

Then we can restore it by executing updates that will be queued for it elsewhere in the system. Assuming those updates are successfully saved in other

nodes, and the updates can be identified by conditions on timestamp, transaction ID and etc.

Pages of committed transactions were not written to disk. So we simply need REDO.

Recovery: Case 2

If both the RS and WS are destroyed in the failed node,

Then we have to reconstruct both segments from other projections and join indexes in the system. First restore segments by exploiting Insertion

Vectors and Deleted Record Vectors from other nodes.

Second the queued updates must be run as in Case 1.

Recovery: Case 3

If WS is damaged but RS is good in the failed node,

Then we can reconstruct the WS from other corresponding WS segments and/or RS segments. Identifying corresponding WS segments by checki

ng the range of sort key. Using the sort keys to find storage keys, And then finding other tuple columns by following

appropriate join indexes.

Queries for Recovering WS

Note that each WS segment, S, contains only tuples with an insertion timestamp later than some time tlastmove(S).

References

Mike Stonebraker, Daniel Abadi, Adam Batkin, Xuedong Chen, Mitch Cherniack, Miguel Ferreira, Edmond Lau, Amerson Lin, Sam Madden, Elizabeth O'Neil, Pat O'Neil, Alex Rasin, Nga Tran and Stan Zdonik. C-Store: A Column Oriented DBMS VLDB, pages 553-564, 2005.

Raghu Ramakrishnan and Johannes Gehrke. Database Management Systems (3rd edition).

C-Store: Concurrency Control and Recovery Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY...

Documents

Transcript of C-Store: Concurrency Control and Recovery Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY...