Distributed Counters in Cassandra (Cassandra Summit 2010)

41
Distributed Counters in Cassandra Friday, August 13, 2010

description

Distributed Counters in Cassandra#1072https://issues.apache.org/jira/browse/CASSANDRA-1072

Transcript of Distributed Counters in Cassandra (Cassandra Summit 2010)

Page 1: Distributed Counters in Cassandra (Cassandra Summit 2010)

Distributed Countersin Cassandra

Friday, August 13, 2010

Page 2: Distributed Counters in Cassandra (Cassandra Summit 2010)

Distributed Counters in Cassandra

I: Goal II: DesignIII: Implementation

Friday, August 13, 2010

Page 3: Distributed Counters in Cassandra (Cassandra Summit 2010)

I: Goal

Distributed Counters in Cassandra

Friday, August 13, 2010

Page 4: Distributed Counters in Cassandra (Cassandra Summit 2010)

Goal

Distributed Counters in Cassandra

Low Latency,Highly AvailableCounters

Friday, August 13, 2010

Page 5: Distributed Counters in Cassandra (Cassandra Summit 2010)

II: Design

Distributed Counters in Cassandra

Friday, August 13, 2010

Page 6: Distributed Counters in Cassandra (Cassandra Summit 2010)

Distributed Counters in Cassandra

I: Traditional Counter Design II: Abstract StrategyIII: Distributed Counter Design

Friday, August 13, 2010

Page 7: Distributed Counters in Cassandra (Cassandra Summit 2010)

I: Traditional Counter Design

Distributed Counters in Cassandra

Design

Friday, August 13, 2010

Page 8: Distributed Counters in Cassandra (Cassandra Summit 2010)

Distributed Counters in Cassandra

Traditional Counter DesignAtomic Counters

1. single machine2. one order of execution3. strongly consistent

Friday, August 13, 2010

Page 9: Distributed Counters in Cassandra (Cassandra Summit 2010)

Distributed Counters in Cassandra

Traditional Counter DesignProblems

1. SPOF / single master2. high latency3. manually sharded

Friday, August 13, 2010

Page 10: Distributed Counters in Cassandra (Cassandra Summit 2010)

Distributed Counters in Cassandra

Traditional Counter DesignQuestion

What constraints can we relax?

Friday, August 13, 2010

Page 11: Distributed Counters in Cassandra (Cassandra Summit 2010)

II: Abstract Strategy

Distributed Counters in Cassandra

Design

Friday, August 13, 2010

Page 12: Distributed Counters in Cassandra (Cassandra Summit 2010)

Distributed Counters in Cassandra

Abstract StrategyConstraints to Relax

1. one order of execution2. strong consistency

Friday, August 13, 2010

Page 13: Distributed Counters in Cassandra (Cassandra Summit 2010)

Distributed Counters in Cassandra

Abstract StrategyRelax: One Order of Execution

commutative operation:

- operations must be re-orderable

Friday, August 13, 2010

Page 14: Distributed Counters in Cassandra (Cassandra Summit 2010)

Distributed Counters in Cassandra

Abstract StrategyRelax: Strong Consistency

partitioned work:

- each op must occur once

- unique partition identifieridempotent repair:

- recognize ops from other partitions

Friday, August 13, 2010

Page 15: Distributed Counters in Cassandra (Cassandra Summit 2010)

III: Distributed Counter Design

Distributed Counters in Cassandra

Design

Friday, August 13, 2010

Page 16: Distributed Counters in Cassandra (Cassandra Summit 2010)

Distributed Counters in Cassandra

Distributed Counter DesignRequirements

1. commutative operation2. partitioned work3. idempotent repair

Friday, August 13, 2010

Page 17: Distributed Counters in Cassandra (Cassandra Summit 2010)

Distributed Counters in Cassandra

Distributed Counter DesignCommutative Operation

addition:

- commutative operation

- sum ops performed by all replicas

- a + b = b + a

Friday, August 13, 2010

Page 18: Distributed Counters in Cassandra (Cassandra Summit 2010)

Distributed Counters in Cassandra

Distributed Counter DesignPartitioned Work

each op assigned to a replica:

- every replica sums all of its ops

Friday, August 13, 2010

Page 19: Distributed Counters in Cassandra (Cassandra Summit 2010)

Distributed Counters in Cassandra

Distributed Counter DesignIdempotent Repair

save counts from remote replicas:

- keep highest count seenprevent multiple execution:

- do not transfer the target replica’s count

Friday, August 13, 2010

Page 20: Distributed Counters in Cassandra (Cassandra Summit 2010)

III: Implementation

Distributed Counters in Cassandra

Friday, August 13, 2010

Page 21: Distributed Counters in Cassandra (Cassandra Summit 2010)

Distributed Counters in Cassandra

I: Data Structure II: Single NodeIII: Eventual Consistency

Friday, August 13, 2010

Page 22: Distributed Counters in Cassandra (Cassandra Summit 2010)

I: Data Structure

Distributed Counters in Cassandra

Friday, August 13, 2010

Page 23: Distributed Counters in Cassandra (Cassandra Summit 2010)

Distributed Counters in Cassandra

Data StructureRequirements

local counts:

- incrementally updateremote counts:

- independently track partitions

Friday, August 13, 2010

Page 24: Distributed Counters in Cassandra (Cassandra Summit 2010)

Distributed Counters in Cassandra

Data StructureContext Format

list of (replica id, count) tuples:[(replica A, count), (replica B, count), ...]

Friday, August 13, 2010

Page 25: Distributed Counters in Cassandra (Cassandra Summit 2010)

Distributed Counters in Cassandra

Data StructureContext Mutations

local write:sum local count and write deltanote: memtable

Friday, August 13, 2010

Page 26: Distributed Counters in Cassandra (Cassandra Summit 2010)

Distributed Counters in Cassandra

Data StructureContext Mutations

remote repair:for each replica,keep highest count seen(local or from repair)

Friday, August 13, 2010

Page 27: Distributed Counters in Cassandra (Cassandra Summit 2010)

II: Single Node

Distributed Counters in Cassandra

Friday, August 13, 2010

Page 28: Distributed Counters in Cassandra (Cassandra Summit 2010)

Distributed Counters in Cassandra

Single NodeWrite Path

client1. construct column

- value: delta (big-endian long)

- clock: empty2. thrift: insert / batch_mutate

Friday, August 13, 2010

Page 29: Distributed Counters in Cassandra (Cassandra Summit 2010)

Distributed Counters in Cassandra

Single NodeWrite Path

coordinator1. choose partition

- choose target replica

- requirement: ConsistencyLevel.ONE

2. construct clock- context format: [(target replica id, count delta)]

Friday, August 13, 2010

Page 30: Distributed Counters in Cassandra (Cassandra Summit 2010)

Distributed Counters in Cassandra

Single NodeWrite Path

target replicainsert:

1. memtable does not contain column2. insert column into memtable

Friday, August 13, 2010

Page 31: Distributed Counters in Cassandra (Cassandra Summit 2010)

Distributed Counters in Cassandra

Single NodeWrite Path

target replicaupdate:

1. memtable contains column2. retrieve existing column3. create new column

- context: sum local count w/ delta from write4. replace column in ConcurrentSkipListMap5. if failed to replace column, go to step 2.

Friday, August 13, 2010

Page 32: Distributed Counters in Cassandra (Cassandra Summit 2010)

Distributed Counters in Cassandra

Single NodeWrite Path

Interesting Note:MTs are serialized to SSTs, as-is

- each SST encapsulates the updateswhen it was an MT

- local count total must be aggregated across the MT and all SSTs

Friday, August 13, 2010

Page 33: Distributed Counters in Cassandra (Cassandra Summit 2010)

Distributed Counters in Cassandra

Single NodeRead Pathtarget replicaread:

1. construct collating iterator over:- frozen snapshot of MT- all relevant SSTs

2. resolve column- local counts: sum- remote counts: keep max

3. construct value- sum local and remote counts (big-endian long)

Friday, August 13, 2010

Page 34: Distributed Counters in Cassandra (Cassandra Summit 2010)

Distributed Counters in Cassandra

Single NodeCompaction

replicacompaction:

1. construct collating iterator over all SSTs2. resolve every column in the CF

- local counts: sum

- remote counts: keep max3. write out resolved CF

Friday, August 13, 2010

Page 35: Distributed Counters in Cassandra (Cassandra Summit 2010)

III: Eventual Consistency

Distributed Counters in Cassandra

Friday, August 13, 2010

Page 36: Distributed Counters in Cassandra (Cassandra Summit 2010)

Distributed Counters in Cassandra

Eventual ConsistencyRead Repair

coordinator / replicaread repair:

1. calculate resolved (superset) CF

- resolve every column (local: sum, remote: max)2. return resolved CF to client

Friday, August 13, 2010

Page 37: Distributed Counters in Cassandra (Cassandra Summit 2010)

Distributed Counters in Cassandra

Eventual ConsistencyRead Repair

coordinator / replicaread repair:

1. calculate repair CF for each replica

- calculate diff CF between resolved and received

- modify columns to remove target replica’s counts2. send repair CF to each replica

Friday, August 13, 2010

Page 38: Distributed Counters in Cassandra (Cassandra Summit 2010)

Distributed Counters in Cassandra

Eventual ConsistencyAnti-Entropy Service

sending replicaAES:

1. follow normal AES code path

- calculate repair SST based on shared ranges

- send repair SST

Friday, August 13, 2010

Page 39: Distributed Counters in Cassandra (Cassandra Summit 2010)

Distributed Counters in Cassandra

Eventual ConsistencyAnti-Entropy Service

receiving replicaAES:

1. post-process streamed SST

- re-build streamed SST

- note: strip out local replica’s counts2. remove temporary descriptor3. add to SSTableTracker

Friday, August 13, 2010

Page 40: Distributed Counters in Cassandra (Cassandra Summit 2010)

Distributed Counters in Cassandra

Questions?

Friday, August 13, 2010

Page 41: Distributed Counters in Cassandra (Cassandra Summit 2010)

Issues:#580: Vector Clocks#1072: Distributed Counters

Related Work:Helland and Campbell, Building on Quicksand, CIDR (2009),Sections 5 & 6.

My email address:[email protected]

Distributed Counters in Cassandra

More Information

Friday, August 13, 2010