Why distributed databases suck, and what to do …...Why distributed databases suck, and what to do...
Transcript of Why distributed databases suck, and what to do …...Why distributed databases suck, and what to do...
1
”Do you want a database that goes
down or one that serves wrong data?"
Why distributed databases suck, and what to do about it
- Regaining consistency
2
■ NoSQL team lead at Trifork, Aarhus, Denmark
■ Working with databases since '97
■ NoSQL since 2008
■ Danish Shared Medication Record
■ Migrating data from MySQL to Riak
■ Devel Riak clients
■ NoSQL architect on various international projects
About the speaker
RuneSkouLarsen
3
■ Part 1: Working with eventual consistency
■ NoSQL persistence landscape
■ What is consistency
■ Eventual vs. sequential consistence
■ Conflicts and how to handle them
■ CRDT's
■ Consistency models of current OLTP databases
■ Part 2: Stronger consistency in distributed, fault tolerant systems
■ Consensus
■ Delta consistency
■ Dynamic delta Consistency
Agenda
4
Polyglot persistence landscape
In-memory
Neo4jVoltDBRedis
OLTP
RiakCassandraVoldemortCouchBase
Analytics
Hadoop
EasyDB
MongoDBCouchDB
5
■ Redundancy
■ Availability
■ Scaling
■ Getting closer to your users
Why distributed databases?
6
■Consistency:
All nodes see the same
data at the same time
■Eventual consistency → Autonomous consistency
■Sequential consistency → Bureaucratic consistency
What is Concistency
7
■ Eventual consistency
Support disconnected operations
– Better to read a stale value than nothing
– Better to save writes somewhere than nothing
Potentially anomalous application behavior
– Stale reads and conflicting writes…
■ Sequential consistency
Requires highly available connections
Not suitable for certain scenarios:
– Disconnected clients (e.g. your phone)
– Apps might prefer potential inconsistency to loss of availability
When to be Consistent with what
8
Conflicting updates
AsynchronousSynchronization
User A User B
A B
A B
9
■Assign timestamp to all objects
■Simple but fragile – depends on precise synchronization of timers
■Data is lost
Last Write Wins (LWW)
AsynchronousSynchronization
User A User B
At=t0
Bt=t1
At=t0
Bt=t1
10
Google Spanner
‘As a distributed-systems developer, you’re taught from — I want to say childhood — not to trust time. What we did is find a way that we could trust time — and understand what it meant to trust time.’
— Andrew Fikes
11
■Assign vector clock to objects
■Ancestors are removed – descendants remain
Detecting conflicts using Vector Clocks (1)
AsynchronousSynchronization
User A User B
Avclock=a:1
Bvclock=a:1,b:1
Avclock=a:1
Avclock=a:1
Bvclock=a:1,b:1
12
■Spawn siblings when causality chain is broken
Detecting conflicts using Vector Clocks (2)
AsynchronousSynchronization
User A User B
Avclock=a:1
Bvclock=b:1
Avclock=a:1
Bvclock=b:1
13
■Keep both values as siblings
■User does the merging
■The only solution if you need to do ”intelligent” merging or start outside processes.
Semantic resolution
AsynchronousSynchronization
User A User B
A B
A B
C
14
■Datastructure intrinsically merges objects
■Limited applicability
Conflict-free Replicated DataTypes
AsynchronousSynchronization
User A User B
A B
A BAB AB
15
■ Convergent (CvRDT)
■ State is replicated
■ Moves towards one value
■ Commutative (CmRDT)
■ Operations to the state are replicated
■ The order of operations is insignificant
a*b = b*a
■ CvRDT and CmRDT can emulate eachother
Conflict-free Replicated Data Types
16
CRDT examples: G-set and 2P-Set
RIP
Tombstone
17
■ CRDTs: Consistency without concurrency control
2009
INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE
■ A comprehensive study of Convergent and
Commutative Replicated Data Types
2011
INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE
■ Sean Cribbs - Eventually Consistent Data Structures
http://vimeo.com/43903960
CRDT References
18
■ Last Write Wins
■ Easy
■ Data is lost
■ Depends on timestamps
■ Semantic resolution
■ Requires application/user involvement
■ Generic solution
■ Conflict-free Data Types
■ Data structure has built-in convergence
■ Limited ability to model real-world problems
Methods for handling conflicts
19
■ Last write wins
Riak
CouchDB/CouchBase
Cassandra
■ User resolvable conflicts
Riak
Voldemort
CouchDB/CouchBase (but unreliable)
■ Active anti-entropy
Riak (Soon)
Consistency models of OLTP databases
■ Hinted handoff with sloppy quorums (highest write-availability)
Riak
Cassandra
■ Strong consistency (read you own writes + strict quorums)
Riak
Voldemort
Cassandra
CouchBase
MongoDB
Traditional SQL databases (Oracle, MySQL, etc.)
20
AtomicConsistentIsolatedDurable
”Consistency pH”
availabilityConsistency
BasicallyAvailableSoft stateEventual Consistency
vs
21
Consensus
Consensus
■ Protocol for agreeing on a decision
■ More than half the nodes must be in agreement (n/2+1)
■ Tolerates remaining nodes being down/slow/un-updated.
availabilityConsistency
22
Example: Ensuring idempotence using consensus
■ Communication protocols are unreliable and requests can be resent even when they have already completed.
■ Clients assign requestID.
■ If a request is resent, we should return the first answer instead of processing it again.
■ vnodes serialize writes in Riak.
■ We use Riak. N=3, PW=quorum to ensure strict quorums.(*)
(*) Riak has a bug in the P checks, but we have deemed it insignificant to our use.
23
Requests
Example: Ensuring idempotence using consensus
Doctor systemPharmacy
system
Requests Requests
Requests
Requestidempotence
Proxy instance
24
Example: Ensuring idempotence using consensus
Doctor system
reqid=xyz
Pharmacy system
reqid=xyz Down
reqid=xyz
Requestidempotence
Proxy instance
← We tolerate one node down at a time
Asuming n<=nodes:n=3: quorum=2, maxdown=1n=4: quorum=3, maxdown=1n=5: quorum=3, maxdown=2n=6: quorum=4, maxdown=2n=7: quorum=4, maxdown=3
25
Example: Ensuring idempotence using consensus
Doctor system
reqid=xyz
Pharmacy system
reqid=xyz
reqid=xyz
Requestidempotence
Proxy instance
26
Delta consistency
Consensus
■ An update will propagate through the system and all replicas will be consistent after a fixed time period δ
■ Easy to understand for customer
availabilityConsistency
Delta consistency
27
Example: Delta Consistency with prescription replication
We guarentee that prescriptions are replicated from Oracle to Riak in 20 minutes.
OracleMaster
OracleMView
Riak Riak
Riak
Drug medication server
Prescriptionserver
Max 20 minutes
28
Dynamic Delta consistency
Consensus
■ Same as Delta Consistency, but users can monitor directly how far behind we are
■ Define one or more authorities, and track how far behind they are.
■ All responses are added information on updatedness of data for each authority.
■ Useful when delay is normally low (sub-second), but can be high in times of degraded service.
■ Useful for CQRS or temporarily offline systems
■ Pro/Con: Users have to understand what data delay means.
availabilityConsistency
Delta consistency
Dynamic Delta consistency
29
■ When beginning a sync, note the time on the authority
■ After completing a sync, store the time of last sync on one or boths sides.
■ Expose updatedness of data.
Example: Dynamic Delta Consistency using mobile device
Mobiledevice
Riak Riak
Riak
Riak Relayserver
RiakSync
30
■ Commands trigger async events
■ Events update views
■ Expose the oldest waiting event as updated_until on view, or now if no events are in queue.
Example: Dynamic Delta Consistency using CQRS
View
Eventlog
31
■ Setup is multiple datacenters – everybody replicates with everybody at intervals.
”full sync”
■ When a full sync is done, save the sync data in each data center
Example: DC1 done syncing with DC2
– sync started at time t.
■ When a datacenter is internally consistent (no pending handoffs for instance), it can expose the time of sync with the other authorities as updated_until timestamp.
Example: Dynamic Delta Consistency using multiple authorities
DC1
DC2
DC3
32
Thank you!
RuneSkouLarsen