When Scalability Meets Consistency: Genuine Multiversion

28
When Scalability Meets Consistency: Genuine Multiversion Update-Serializable Partial Data Replication Sebastiano Peluso , Pedro Ruivo, Paolo Romano, Francesco Quaglia and Luís Rodrigues 1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal

description

Sebastiano Peluso , Pedro Ruivo, Paolo Romano, Francesco Quaglia and Luís Rodrigues. When Scalability Meets Consistency: Genuine Multiversion Update-Serializable Partial Data Replication. Talk Structure. Motivation and related work The GMU protocol Experimental results. - PowerPoint PPT Presentation

Transcript of When Scalability Meets Consistency: Genuine Multiversion

Page 1: When Scalability Meets Consistency: Genuine Multiversion

When Scalability Meets Consistency:Genuine Multiversion

Update-Serializable Partial Data Replication

Sebastiano Peluso, Pedro Ruivo, Paolo Romano, Francesco Quaglia and Luís Rodrigues

1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal

Page 2: When Scalability Meets Consistency: Genuine Multiversion

Talk StructureMotivation and related work

The GMU protocol

Experimental results

1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 2

Page 3: When Scalability Meets Consistency: Genuine Multiversion

Motivation and related work

1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 3

Page 4: When Scalability Meets Consistency: Genuine Multiversion

Distributed STMsSTMs are being employed in new scenarios:

Database caches in three-tier web apps (FénixEDU)

HPC programming language (X10) In-memory cloud data grids (Coherence,

Infinispan)New challenges:

ScalabilityFault-tolerance

1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal

REPLICATION

4

Page 5: When Scalability Meets Consistency: Genuine Multiversion

Full ReplicationAll sites store the whole set of dataFull replication in transactional systems is a very

investigated problem:Several solutions in DBMS world:

Update anywhere-anytime-anyway solutions [SIGMOD96] Deferred-update replication techniques [JDPD03,

VLDB00] Lazy techniques by relaxing consistency properties

[SOSP07]Specific solutions for DSTMs:

Efficient coding of the read-set [PRDC09] Communication/computation overlapping [NCA10] Lease-based commits [Middleware10]

1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 5

Page 6: When Scalability Meets Consistency: Genuine Multiversion

Partial Replication It is a way to increase scalability.Each site stores a partial copy of the data.Genuine partial replication schemes maximize

scalability by ensuring that: Only data sites that replicate data item read or

written by a transaction T, exchange messages for executing/committing T.

Existing 1-Copy Serializable implementations enforce distributed validation of read-only transactions [SRDS10]: considerable overheads in typical workloads

1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 6

Page 7: When Scalability Meets Consistency: Genuine Multiversion

Objectives

1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal

RequirementsRead-only transactions never abort or

blockGenuine certification mechanism

Objectives• Partially replicated DSTM• Scalability and performance as first class targets• Find a sweet spot in the consistency/performance

tradeoff

7

Page 8: When Scalability Meets Consistency: Genuine Multiversion

Issues with Partial ReplicationExtending existing local multiversion (MV) STMs

is not enoughLocal MV STMs rely on a single global counter to

track version advancementProblem:

Commit of transactions should involve ALL NODES

NO GENUINENESS = POOR SCALABILITY

1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 8

Page 9: When Scalability Meets Consistency: Genuine Multiversion

GMU: Genuine Multiversion Updateserializable replication [ICDCS12]

1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 9

Page 10: When Scalability Meets Consistency: Genuine Multiversion

Key concepts

In the execution/commit phase of a transaction T, ONLY nodes which store data items accessed by T are involved.

It uses multiple versions for each data item It builds visible snapshots = freshest consistent

snapshots taking into account:1. causal dependencies vs. previously committed transactions

at the time a transaction began,2. previous reads executed by the same transaction

Vector clocks used to establish visible snapshots

1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal

G M U

10

Page 11: When Scalability Meets Consistency: Genuine Multiversion

Main data structures (i)

For each node N:VCLog: sequence of vector clocks of

“recently” committed transactions on NPrepareVC: vector clock greater than or

equal to the most recent vector clock in VCLog

1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 11

Page 12: When Scalability Meets Consistency: Genuine Multiversion

Main data structures (ii)

For each transaction T:VC: a vector clock that is

initialized with the most recent vector clock in local VCLog,

updated upon reads during execution

>> to ensure that T observes the most recent serializable snapshot,

at commit time>> to assign final vector clock to the transaction (and to its

write-set).

1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 12

Page 13: When Scalability Meets Consistency: Genuine Multiversion

Main data structures (iii)A chain of versions per data item id:

previous:value: 2 VN: 8

previous:value: 1 VN: 5

previous:value: 0 VN: 2

id

Versions:

1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal

8

01

i

n-2n-1

Transaction T commits on node i

T’s Vector Clock

13

Page 14: When Scalability Meets Consistency: Genuine Multiversion

T reads id on node i: Rule 1

Informally: it avoids reading remotely “too old” versions

Formally: if it is the first read of T on iwait that VCLog.mostRecVCi[i] >= T.VC[i]

this ensures that causal dependencies are enforced

1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 14

Page 15: When Scalability Meets Consistency: Genuine Multiversion

Rule 1 in actionNode 0 Node 1

(it stores X)Node 2

(it stores Y)

X(2)

X(2)T1:R(X)

(1,1,1)

(1,2,2)

(1,1,1)

Y(2)

(1,2,2)

T0:W(X,v)

T0:W(Y,w)

(1,1,1)

T1:R(Y)Y(2)

(1,2,2)

Most recent VC in VCLog

T1.VC

T0:Commit

Commit

(1,2,2)T1.VC

1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 15

Page 16: When Scalability Meets Consistency: Genuine Multiversion

T reads id on node i: Rule 2

Informally: it maximizes freshness by moving T’s VC ahead in time “as much as possible” in commit log

Formally: if it is the first read of T on i, select the most

recent VC in i’s Commit Log s.t.VC[j] <= T.VC[j]for each node j on which T has already

read

T.VC=MAX{VC, T.VC}

Note: this

updates

only the

entries of

T.VC of the

nodes from

which T

had not

read yet

1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 16

Page 17: When Scalability Meets Consistency: Genuine Multiversion

Rule 2 in actionNode 0 Node 1

(it stores X)Node 2

(it stores Y)

X(21)

Y(11)

X(20)T0:R(X)

(1,1,1)

(1,21,21)

(1,1,1)

Y(21)

(1,21,21)

T1:W(X,v)

T1:W(Y,w)

X(20)

(1,20,1)

T0:R(Y) Y(11)

T0:Commit

(1,20,1)

Most recent VC in VCLog

T0.VC

T1:CommitCommit

(1,20,11)T0.VC

1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal

(1,1,11)

Y(1)

17

Page 18: When Scalability Meets Consistency: Genuine Multiversion

T reads id on node i: Rule 3

Informally: observe the most recent consistent version of id based on T’s history (previous reads)

Formally: iterate over the versions of id and return the most recent one s.t.

id.version.VN <= T.VC[i]

1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 18

Page 19: When Scalability Meets Consistency: Genuine Multiversion

Committing read-only transactions

Read-only transactions commit locally:No additional validationsNo possibility of aborts

… and are never blocked, as in typical multiversion schemes.

1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 19

Page 20: When Scalability Meets Consistency: Genuine Multiversion

Committing update transactions

Run 2PC :Upon prepare message reception (participant-side i):

Acquire read & write locksValidate read-setIncrease PrepareVC[i] number and send PrepareVC back

If all replies are positive (coordinator-side):Build a commit vector clockBroadcast back commit message

1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 20

Page 21: When Scalability Meets Consistency: Genuine Multiversion

Building the commit Vector Clock

A variant of the Skeen’s algorithm is implemented [SKEEN85].

This allows to keep track causal dependencies developed by:a transaction T during its execution,the most recent committed transactions at the

nodes contacted by T

1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 21

Page 22: When Scalability Meets Consistency: Genuine Multiversion

Consistency criterion

GMU ensures Extended Update Serializability:Update Serializability ensures:

1-Copy-Serializabilty (1CS) on the history restricted to committed update transactions

1CS on the history restricted to committed update transactions and any single read-only transaction: but it can admit non-1CS histories containing at least 2 read-

only transactions

Extended Update Serializability:ensures US property also to executing transactionsanalogous to opacity in STMs

1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 22

Page 23: When Scalability Meets Consistency: Genuine Multiversion

Experimental Results

1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 23

Page 24: When Scalability Meets Consistency: Genuine Multiversion

Experiments on private cluster8 core physical nodes

TPC-C- 90% read-only xacts- 10% update xacts

- 4 threads per node

- moderate contention (15% abort rate at 20 nodes)

1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 24

Page 25: When Scalability Meets Consistency: Genuine Multiversion

Experiments on private cluster8 core physical nodes

TPC-C- 90% read-only xacts- 10% update xacts

- 4 threads per node

- moderate contention (15% abort rate at 20 nodes)

1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 25

Page 26: When Scalability Meets Consistency: Genuine Multiversion

FutureGrid ExperimentsAll nodes are 2-core VMs deployed in the same site

TPC-C- 90% read-only xacts- 10% update xacts

- 1 thread per node

- low/moderate contention, also at 40 nodes

1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 26

Page 27: When Scalability Meets Consistency: Genuine Multiversion

Thanks for the attention

1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal 27

Page 28: When Scalability Meets Consistency: Genuine Multiversion

References

1st Euro-TM Workshop on Distributed Transactional Memory (WDTM 2012), Lisbon, Portugal

[ICDCS12] Sebastiano Peluso, Pedro Ruivo, Paolo Romano, Francesco Quaglia, Luís Rodrigues. “When Scalability Meets Consistency: Genuine Multiversion Update-Serializable Partial Replication”. The IEEE 32nd International Conference on Distributed Computing Systems, June, 2012. [JDPD03] Fernando Pedone, Rachid Guerraoui, André Schiper. “The Database State Machine Approach”. Journal of Distributed and Parallel Databases, vol. 14, issue 1, 71-98, July, 2003.[Middleware10] Nuno Carvalho, Paolo Romano, Luís Rodrigues. “Asynchronous lease-based replication of software transactional memory”. Proc. of the 11th ACM/IFIP/USENIX International Conference on Middleware, 376-396, 2010.[NCA10] Roberto Palmieri, Francesco Quaglia, Paolo Romano. “AGGRO: Boosting STM Replication via Aggressively Optimistic Transaction Processing”. Proc. of the 9th IEEE International Symposium on Networking Computing and Applications, 20-27, 2010.[PRDC09] Maria Couceiro, Paolo Romano, Nuno Carvalho, Luís Rodrigues. “D2STM: Dependable Distributed Software Trasanctional Memory”. Proc. of 15th IEEE Pacific Rim International Symposium on Dependable Computing, 307-313, 2009.[SIGMOD96] Jim Gray, Pat Helland, Patrick O’Neil, Dennis Shasha. “The dangers of replication and solutions”. Proc. of the 1996 ACM SIGMOD international conference on Management of data, vol. 25, issue 2 , 173-182, June, 1996.[SKEEN85] D. Skeen. “Unpublished communication”, 1985. Referenced in K. Birman, T. Joseph “Reliable Communication in the Presence of Failures”, ACM Trans. on Computer Systems, 47-76, 1987 [SOSP07] G. DeCandia et al. “Dynamo: Amazon’s Highly Available key-value Store”. Proc. of the 21st ACM SIGOPS Symposium on Operating Systems Principles, 2007 [SRDS10] Nicolas Schiper, Pierre Sutra, Fernando Pedone. “P-Store: Genuine Partial Replication in Wide Area Networks”. Proc. of the 29th Symposium of Reliable Distributed Systems, 2010.[VLDB00] Bettina Kemme, Gustavo Alonso. “Don’t Be Lazy, Be Consistent: Postgres-R, A New Way to Implement Database Replication”. Proc. of the 26th International Conference on Very Large Data Bases, 134-143, 2000.

28