Dangers of Replication

15
CS 600.419 Storage Systems Dangers of Replication Materials taken from “J. Gray, P. Helland, P. O’Neil, and D. Shasha. The Dangers of Replication and a Solution. SIGMOD, 2006.” http://research.microsoft.com/~gray/replicas.ps

description

Dangers of Replication. Materials taken from “J. Gray, P. Helland, P. O’Neil, and D. Shasha. The Dangers of Replication and a Solution. SIGMOD, 2006.” http://research.microsoft.com/~gray/replicas.ps. What’s the danger?. Replication of transactional data results in unstable system performance - PowerPoint PPT Presentation

Transcript of Dangers of Replication

Page 1: Dangers of Replication

CS 600.419 Storage Systems

Dangers of Replication

Materials taken from “J. Gray, P. Helland, P. O’Neil, and D. Shasha. The Dangers of Replication and a Solution. SIGMOD, 2006.”

http://research.microsoft.com/~gray/replicas.ps

Page 2: Dangers of Replication

CS 600.419 Storage Systems

What’s the danger?

• Replication of transactional data results in unstable system performance

• For consistent replication– Waits and deadlocks

• For update-anywhere-anytime replication– Reconciliations

• Both grow polynomially (w/ meaningful exponents) in the number of clients– Based on simple, lower bounds derived from mean-value analysis

Page 3: Dangers of Replication

CS 600.419 Storage Systems

What’s the point?

• This theme is predicated on the knowledge that globally consistent replication does not scale

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 4: Dangers of Replication

CS 600.419 Storage Systems

Replication Policies

• Eager replication:– Copies are updated as part of the original transaction.

• Lazy replication:– One replica is updated. Other copies are updated asynchronously

• Update policy:– Group: any node can update its replica.

– Master: only master updates its replica. The rest replicas are read only.

Page 5: Dangers of Replication

CS 600.419 Storage Systems

Representing Writes

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 6: Dangers of Replication

CS 600.419 Storage Systems

Mastered and Group Replication

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 7: Dangers of Replication

CS 600.419 Storage Systems

The Scale-up Pitfall

• Replication works well on small, prototype systems– But, at deployment, replication is unstable

• At larger scales– Messages propagation delay increases

– Higher transaction rates

• For eager replication– More transactions with each txn taking longer

• For lazy transactions– Delays in reconciliation leads to system delusion

Page 8: Dangers of Replication

CS 600.419 Storage Systems

Analysis of Eager Group Replication

• Scaling laws– Third power of the number of nodes

– Fifth power of the # of operations per transaction

• Problems with eager replication– Cannot be used by disconnected nodes

– Probability of deadlocks (failed transactions) increases with systems size

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (LZW) decompressorare needed to see this picture.

Page 9: Dangers of Replication

CS 600.419 Storage Systems

Analysis of Lazy Group Replication

• Scaling laws– Third power of the number of nodes

– third power of the # of operations per transaction

• Better than eager, but not so good

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 10: Dangers of Replication

CS 600.419 Storage Systems

Analysis of Lazy Master Replication

• Scaling laws– second power of the number of nodes

– fifth power of the # of operations per transaction

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 11: Dangers of Replication

CS 600.419 Storage Systems

Status of Replication

• Negative scaling results– Don’t account for message delays (so it’s worse)

– Can’t escape these via lazy vs eager options

• No reason for group replication– Master is the same (eager) or better (lazy)

• So, what do we do– Avoid scale, keep systems small

Page 12: Dangers of Replication

CS 600.419 Storage Systems

Two-Tier Replication

• Two node types:– Base nodes: Always connected, store replica, master most objects

– Mobile nodes: often disconnected, store a replica, issues tentative transactions

• Two version types:– Master version:

• Exists at the object owner, other may have older versions

– Tentative version:• Local version is updated by tentative transactions

Page 13: Dangers of Replication

CS 600.419 Storage Systems

Pictures to Entertain

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Page 14: Dangers of Replication

CS 600.419 Storage Systems

System Principles

• Hierarchies to reduce scale– Nodes (Master & Mobile-disconnected)

– Transactions (Tentative and Eager/Consistent)

• Techniques– Convergence (Bayou-like eventual consistency)

– Idempotence: encode writes in non-conflicting ways

• Does it fix any of Bayou’s semantic problems?

Page 15: Dangers of Replication

CS 600.419 Storage Systems

Conclusions

• Eager: waits and deadlocks

• Lazy converts waits and deadlocks into reconciliations

• Both do not scale.

• Two tier replication: – Supports mobile nodes

– Combine eager-master-replication with local updates