Why distributed databases suck, and what to do …...Why distributed databases suck, and what to do...

1

”Do you want a database that goes

down or one that serves wrong data?"

Why distributed databases suck, and what to do about it

- Regaining consistency

2

■ NoSQL team lead at Trifork, Aarhus, Denmark

■ Working with databases since '97

■ NoSQL since 2008

■ Danish Shared Medication Record

■ Migrating data from MySQL to Riak

■ Devel Riak clients

■ NoSQL architect on various international projects

About the speaker

RuneSkouLarsen

3

■ Part 1: Working with eventual consistency

■ NoSQL persistence landscape

■ What is consistency

■ Eventual vs. sequential consistence

■ Conflicts and how to handle them

■ CRDT's

■ Consistency models of current OLTP databases

■ Part 2: Stronger consistency in distributed, fault tolerant systems

■ Consensus

■ Delta consistency

■ Dynamic delta Consistency

Agenda

4

Polyglot persistence landscape

In-memory

Neo4jVoltDBRedis

OLTP

RiakCassandraVoldemortCouchBase

Analytics

Hadoop

EasyDB

MongoDBCouchDB

5

■ Redundancy

■ Availability

■ Scaling

■ Getting closer to your users

Why distributed databases?

6

■Consistency:

All nodes see the same

data at the same time

■Eventual consistency → Autonomous consistency

■Sequential consistency → Bureaucratic consistency

What is Concistency

7

■ Eventual consistency

Support disconnected operations

– Better to read a stale value than nothing

– Better to save writes somewhere than nothing

Potentially anomalous application behavior

– Stale reads and conflicting writes…

■ Sequential consistency

Requires highly available connections

Not suitable for certain scenarios:

– Disconnected clients (e.g. your phone)

– Apps might prefer potential inconsistency to loss of availability

When to be Consistent with what

8

Conflicting updates

AsynchronousSynchronization

User A User B

A B

A B

9

■Assign timestamp to all objects

■Simple but fragile – depends on precise synchronization of timers

■Data is lost

Last Write Wins (LWW)


User A User B

At=t0

Bt=t1

At=t0

Bt=t1

10

Google Spanner

‘As a distributed-systems developer, you’re taught from — I want to say childhood — not to trust time. What we did is find a way that we could trust time — and understand what it meant to trust time.’

— Andrew Fikes

11

■Assign vector clock to objects

■Ancestors are removed – descendants remain

Detecting conflicts using Vector Clocks (1)


User A User B

Avclock=a:1

Bvclock=a:1,b:1

Avclock=a:1

Avclock=a:1

Bvclock=a:1,b:1

12

■Spawn siblings when causality chain is broken

Detecting conflicts using Vector Clocks (2)


User A User B

Avclock=a:1

Bvclock=b:1

Avclock=a:1

Bvclock=b:1

13

■Keep both values as siblings

■User does the merging

■The only solution if you need to do ”intelligent” merging or start outside processes.

Semantic resolution


User A User B

A B

A B

C

14

■Datastructure intrinsically merges objects

■Limited applicability

Conflict-free Replicated DataTypes


User A User B

A B

A BAB AB

15

■ Convergent (CvRDT)

■ State is replicated

■ Moves towards one value

■ Commutative (CmRDT)

■ Operations to the state are replicated

■ The order of operations is insignificant

a*b = b*a

■ CvRDT and CmRDT can emulate eachother

Conflict-free Replicated Data Types

16

CRDT examples: G-set and 2P-Set

RIP

Tombstone

17

■ CRDTs: Consistency without concurrency control

2009

INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE

■ A comprehensive study of Convergent and

Commutative Replicated Data Types

2011

INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE

■ Sean Cribbs - Eventually Consistent Data Structures

http://vimeo.com/43903960

CRDT References

18

■ Last Write Wins

■ Easy

■ Data is lost

■ Depends on timestamps

■ Semantic resolution

■ Requires application/user involvement

■ Generic solution

■ Conflict-free Data Types

■ Data structure has built-in convergence

■ Limited ability to model real-world problems

Methods for handling conflicts

19

■ Last write wins

Riak

CouchDB/CouchBase

Cassandra

■ User resolvable conflicts

Riak

Voldemort

CouchDB/CouchBase (but unreliable)

■ Active anti-entropy

Riak (Soon)

Consistency models of OLTP databases

■ Hinted handoff with sloppy quorums (highest write-availability)

Riak

Cassandra

■ Strong consistency (read you own writes + strict quorums)

Riak

Voldemort

Cassandra

CouchBase

MongoDB

Traditional SQL databases (Oracle, MySQL, etc.)

20

AtomicConsistentIsolatedDurable

”Consistency pH”

availabilityConsistency

BasicallyAvailableSoft stateEventual Consistency

vs

21

Consensus

Consensus

■ Protocol for agreeing on a decision

■ More than half the nodes must be in agreement (n/2+1)

■ Tolerates remaining nodes being down/slow/un-updated.


22

Example: Ensuring idempotence using consensus

■ Communication protocols are unreliable and requests can be resent even when they have already completed.

■ Clients assign requestID.

■ If a request is resent, we should return the first answer instead of processing it again.

■ vnodes serialize writes in Riak.

■ We use Riak. N=3, PW=quorum to ensure strict quorums.(*)

(*) Riak has a bug in the P checks, but we have deemed it insignificant to our use.

23

Requests


Doctor systemPharmacy

system

Requests Requests

Requests

Requestidempotence

Proxy instance

24


Doctor system

reqid=xyz

Pharmacy system

reqid=xyz Down

reqid=xyz

Requestidempotence

Proxy instance

← We tolerate one node down at a time

Asuming n<=nodes:n=3: quorum=2, maxdown=1n=4: quorum=3, maxdown=1n=5: quorum=3, maxdown=2n=6: quorum=4, maxdown=2n=7: quorum=4, maxdown=3

25


Doctor system

reqid=xyz

Pharmacy system

reqid=xyz

reqid=xyz

Requestidempotence

Proxy instance

26

Delta consistency

Consensus

■ An update will propagate through the system and all replicas will be consistent after a fixed time period δ

■ Easy to understand for customer


Delta consistency

27

Example: Delta Consistency with prescription replication

We guarentee that prescriptions are replicated from Oracle to Riak in 20 minutes.

OracleMaster

OracleMView

Riak Riak

Riak

Drug medication server

Prescriptionserver

Max 20 minutes

28

Dynamic Delta consistency

Consensus

■ Same as Delta Consistency, but users can monitor directly how far behind we are

■ Define one or more authorities, and track how far behind they are.

■ All responses are added information on updatedness of data for each authority.

■ Useful when delay is normally low (sub-second), but can be high in times of degraded service.

■ Useful for CQRS or temporarily offline systems

■ Pro/Con: Users have to understand what data delay means.


Delta consistency

Dynamic Delta consistency

29

■ When beginning a sync, note the time on the authority

■ After completing a sync, store the time of last sync on one or boths sides.

■ Expose updatedness of data.

Example: Dynamic Delta Consistency using mobile device

Mobiledevice

Riak Riak

Riak

Riak Relayserver

RiakSync

30

■ Commands trigger async events

■ Events update views

■ Expose the oldest waiting event as updated_until on view, or now if no events are in queue.

Example: Dynamic Delta Consistency using CQRS

View

Eventlog

31

■ Setup is multiple datacenters – everybody replicates with everybody at intervals.

”full sync”

■ When a full sync is done, save the sync data in each data center

Example: DC1 done syncing with DC2

– sync started at time t.

■ When a datacenter is internally consistent (no pending handoffs for instance), it can expose the time of sync with the other authorities as updated_until timestamp.

Example: Dynamic Delta Consistency using multiple authorities

DC1

DC2

DC3

32

Thank you!

RuneSkouLarsen

Why distributed databases suck, and what to do …...Why distributed databases suck, and what to do...

Documents

Transcript of Why distributed databases suck, and what to do …...Why distributed databases suck, and what to do...