When Scalability Meets Consistency: Genuine Multiversion
description
Transcript of When Scalability Meets Consistency: Genuine Multiversion
When Scalability Meets Consistency:Genuine Multiversion
Update-Serializable Partial Data Replication
Sebastiano Peluso, Pedro Ruivo, Paolo Romano, Francesco Quaglia and Luís Rodrigues
1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal
Talk StructureMotivation and related work
The GMU protocol
Experimental results
1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 2
Motivation and related work
1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 3
Distributed STMsSTMs are being employed in new scenarios:
Database caches in three-tier web apps (FénixEDU)
HPC programming language (X10) In-memory cloud data grids (Coherence,
Infinispan)New challenges:
ScalabilityFault-tolerance
1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal
REPLICATION
4
Full ReplicationAll sites store the whole set of dataFull replication in transactional systems is a very
investigated problem:Several solutions in DBMS world:
Update anywhere-anytime-anyway solutions [SIGMOD96] Deferred-update replication techniques [JDPD03,
VLDB00] Lazy techniques by relaxing consistency properties
[SOSP07]Specific solutions for DSTMs:
Efficient coding of the read-set [PRDC09] Communication/computation overlapping [NCA10] Lease-based commits [Middleware10]
1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 5
Partial Replication It is a way to increase scalability.Each site stores a partial copy of the data.Genuine partial replication schemes maximize
scalability by ensuring that: Only data sites that replicate data item read or
written by a transaction T, exchange messages for executing/committing T.
Existing 1-Copy Serializable implementations enforce distributed validation of read-only transactions [SRDS10]: considerable overheads in typical workloads
1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 6
Objectives
1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal
RequirementsRead-only transactions never abort or
blockGenuine certification mechanism
Objectives• Partially replicated DSTM• Scalability and performance as first class targets• Find a sweet spot in the consistency/performance
tradeoff
7
Issues with Partial ReplicationExtending existing local multiversion (MV) STMs
is not enoughLocal MV STMs rely on a single global counter to
track version advancementProblem:
Commit of transactions should involve ALL NODES
NO GENUINENESS = POOR SCALABILITY
1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 8
GMU: Genuine Multiversion Updateserializable replication [ICDCS12]
1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 9
Key concepts
In the execution/commit phase of a transaction T, ONLY nodes which store data items accessed by T are involved.
It uses multiple versions for each data item It builds visible snapshots = freshest consistent
snapshots taking into account:1. causal dependencies vs. previously committed transactions
at the time a transaction began,2. previous reads executed by the same transaction
Vector clocks used to establish visible snapshots
1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal
G M U
10
Main data structures (i)
For each node N:VCLog: sequence of vector clocks of
“recently” committed transactions on NPrepareVC: vector clock greater than or
equal to the most recent vector clock in VCLog
1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 11
Main data structures (ii)
For each transaction T:VC: a vector clock that is
initialized with the most recent vector clock in local VCLog,
updated upon reads during execution
>> to ensure that T observes the most recent serializable snapshot,
at commit time>> to assign final vector clock to the transaction (and to its
write-set).
1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 12
Main data structures (iii)A chain of versions per data item id:
previous:value: 2 VN: 8
previous:value: 1 VN: 5
previous:value: 0 VN: 2
id
Versions:
1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal
8
01
i
n-2n-1
Transaction T commits on node i
T’s Vector Clock
13
T reads id on node i: Rule 1
Informally: it avoids reading remotely “too old” versions
Formally: if it is the first read of T on iwait that VCLog.mostRecVCi[i] >= T.VC[i]
this ensures that causal dependencies are enforced
1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 14
Rule 1 in actionNode 0 Node 1
(it stores X)Node 2
(it stores Y)
X(2)
X(2)T1:R(X)
(1,1,1)
(1,2,2)
(1,1,1)
Y(2)
(1,2,2)
T0:W(X,v)
T0:W(Y,w)
(1,1,1)
T1:R(Y)Y(2)
(1,2,2)
Most recent VC in VCLog
T1.VC
T0:Commit
Commit
(1,2,2)T1.VC
1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 15
T reads id on node i: Rule 2
Informally: it maximizes freshness by moving T’s VC ahead in time “as much as possible” in commit log
Formally: if it is the first read of T on i, select the most
recent VC in i’s Commit Log s.t.VC[j] <= T.VC[j]for each node j on which T has already
read
T.VC=MAX{VC, T.VC}
Note: this
updates
only the
entries of
T.VC of the
nodes from
which T
had not
read yet
1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 16
Rule 2 in actionNode 0 Node 1
(it stores X)Node 2
(it stores Y)
X(21)
Y(11)
X(20)T0:R(X)
(1,1,1)
(1,21,21)
(1,1,1)
Y(21)
(1,21,21)
T1:W(X,v)
T1:W(Y,w)
X(20)
(1,20,1)
T0:R(Y) Y(11)
T0:Commit
(1,20,1)
Most recent VC in VCLog
T0.VC
T1:CommitCommit
(1,20,11)T0.VC
1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal
(1,1,11)
Y(1)
17
T reads id on node i: Rule 3
Informally: observe the most recent consistent version of id based on T’s history (previous reads)
Formally: iterate over the versions of id and return the most recent one s.t.
id.version.VN <= T.VC[i]
1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 18
Committing read-only transactions
Read-only transactions commit locally:No additional validationsNo possibility of aborts
… and are never blocked, as in typical multiversion schemes.
1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 19
Committing update transactions
Run 2PC :Upon prepare message reception (participant-side i):
Acquire read & write locksValidate read-setIncrease PrepareVC[i] number and send PrepareVC back
If all replies are positive (coordinator-side):Build a commit vector clockBroadcast back commit message
1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 20
Building the commit Vector Clock
A variant of the Skeen’s algorithm is implemented [SKEEN85].
This allows to keep track causal dependencies developed by:a transaction T during its execution,the most recent committed transactions at the
nodes contacted by T
1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 21
Consistency criterion
GMU ensures Extended Update Serializability:Update Serializability ensures:
1-Copy-Serializabilty (1CS) on the history restricted to committed update transactions
1CS on the history restricted to committed update transactions and any single read-only transaction: but it can admit non-1CS histories containing at least 2 read-
only transactions
Extended Update Serializability:ensures US property also to executing transactionsanalogous to opacity in STMs
1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 22
Experimental Results
1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 23
Experiments on private cluster8 core physical nodes
TPC-C- 90% read-only xacts- 10% update xacts
- 4 threads per node
- moderate contention (15% abort rate at 20 nodes)
1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 24
Experiments on private cluster8 core physical nodes
TPC-C- 90% read-only xacts- 10% update xacts
- 4 threads per node
- moderate contention (15% abort rate at 20 nodes)
1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 25
FutureGrid ExperimentsAll nodes are 2-core VMs deployed in the same site
TPC-C- 90% read-only xacts- 10% update xacts
- 1 thread per node
- low/moderate contention, also at 40 nodes
1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 26
Thanks for the attention
1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 27
References
1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal
[ICDCS12] Sebastiano Peluso, Pedro Ruivo, Paolo Romano, Francesco Quaglia, Luís Rodrigues. “When Scalability Meets Consistency: Genuine Multiversion Update-Serializable Partial Replication”. The IEEE 32nd International Conference on Distributed Computing Systems, June, 2012. [JDPD03] Fernando Pedone, Rachid Guerraoui, André Schiper. “The Database State Machine Approach”. Journal of Distributed and Parallel Databases, vol. 14, issue 1, 71-98, July, 2003.[Middleware10] Nuno Carvalho, Paolo Romano, Luís Rodrigues. “Asynchronous lease-based replication of software transactional memory”. Proc. of the 11th ACM/IFIP/USENIX International Conference on Middleware, 376-396, 2010.[NCA10] Roberto Palmieri, Francesco Quaglia, Paolo Romano. “AGGRO: Boosting STM Replication via Aggressively Optimistic Transaction Processing”. Proc. of the 9th IEEE International Symposium on Networking Computing and Applications, 20-27, 2010.[PRDC09] Maria Couceiro, Paolo Romano, Nuno Carvalho, Luís Rodrigues. “D2STM: Dependable Distributed Software Trasanctional Memory”. Proc. of 15th IEEE Pacific Rim International Symposium on Dependable Computing, 307-313, 2009.[SIGMOD96] Jim Gray, Pat Helland, Patrick O’Neil, Dennis Shasha. “The dangers of replication and solutions”. Proc. of the 1996 ACM SIGMOD international conference on Management of data, vol. 25, issue 2 , 173-182, June, 1996.[SKEEN85] D. Skeen. “Unpublished communication”, 1985. Referenced in K. Birman, T. Joseph “Reliable Communication in the Presence of Failures”, ACM Trans. on Computer Systems, 47-76, 1987 [SOSP07] G. DeCandia et al. “Dynamo: Amazon’s Highly Available key-value Store”. Proc. of the 21st ACM SIGOPS Symposium on Operating Systems Principles, 2007 [SRDS10] Nicolas Schiper, Pierre Sutra, Fernando Pedone. “P-Store: Genuine Partial Replication in Wide Area Networks”. Proc. of the 29th Symposium of Reliable Distributed Systems, 2010.[VLDB00] Bettina Kemme, Gustavo Alonso. “Don’t Be Lazy, Be Consistent: Postgres-R, A New Way to Implement Database Replication”. Proc. of the 26th International Conference on Very Large Data Bases, 134-143, 2000.
28