Dependable Distributed Applications - uni-potsdam.de · Chubby •Google’s distributed lock...

23
Dependable Distributed Applications Dependable Systems 2014 Lena Herscheid, Dr. Peter Tröger Dependable Distributed Applications | Dependable Systems 2014 1

Transcript of Dependable Distributed Applications - uni-potsdam.de · Chubby •Google’s distributed lock...

Page 1: Dependable Distributed Applications - uni-potsdam.de · Chubby •Google’s distributed lock service •Goal: easily add consensus / leader election to existing application •Lock

Dependable Distributed Applications

Dependable Systems 2014

Lena Herscheid, Dr. Peter Tröger

Dependable Distributed Applications | Dependable Systems 2014 1

Page 2: Dependable Distributed Applications - uni-potsdam.de · Chubby •Google’s distributed lock service •Goal: easily add consensus / leader election to existing application •Lock

Frameworks +Programming ModelsHanmer, Robert. Patterns for fault tolerant software. John Wiley & Sons, 2013.

Introduction to Fault Tolerant CORBA. http://cnb.ociweb.com/cnb/CORBANewsBrief-200301.html

Erlang/OTP http://www.erlang.org/doc/

Dependable Distributed Applications | Dependable Systems 2014 2

Page 3: Dependable Distributed Applications - uni-potsdam.de · Chubby •Google’s distributed lock service •Goal: easily add consensus / leader election to existing application •Lock

Dependable Distributed Applications | Dependable Systems 2014 3

Page 4: Dependable Distributed Applications - uni-potsdam.de · Chubby •Google’s distributed lock service •Goal: easily add consensus / leader election to existing application •Lock

FT-CORBA

• Extension of CORBA standard by commonly used fault tolerance patterns

• Fault model: node crash faults

• Replication• Object level, ReplicationManager + ReplicaFactory• Logical singletons: group of replica object group, appear as a single object• warm / cold passive high recovery time• active / active_with_votinghigh multicast time

• Fault detection• FaultDetector + FaultNotifier• Are assumed inherently fault tolerant

• Failure recovery• Apply log of updated to replica, depending on replica type

• Implementations• Replication in the ORB: Electra, TAO, Orbix+Isis, …• Replication through CORBA objects: DOORS, AQuA, OGS, …

Dependable Distributed Applications | Dependable Systems 2014 4

Page 5: Dependable Distributed Applications - uni-potsdam.de · Chubby •Google’s distributed lock service •Goal: easily add consensus / leader election to existing application •Lock

Erlang/OTP

• Erlang programming language: fault tolerance as design principle• Isolated lightweight processes (managed by the VM)

• Programming model: asynchronous message passing

• “Let it crash” policy• Processes terminate with error codes

• Monitoring processes are expected to do recovery

• Transparent distribution of processes (by VM)

• Open Telecom Platform framework• Common patterns in concurrent distributed Erlang programs

• Modules can instantiate behaviours (server, fsm, supervisor…)

Dependable Distributed Applications | Dependable Systems 2014 5

Page 6: Dependable Distributed Applications - uni-potsdam.de · Chubby •Google’s distributed lock service •Goal: easily add consensus / leader election to existing application •Lock

Erlang/OTP Example supervision tree:

one_for_one restart:

Dependable Distributed Applications | Dependable Systems 2014 6

Page 7: Dependable Distributed Applications - uni-potsdam.de · Chubby •Google’s distributed lock service •Goal: easily add consensus / leader election to existing application •Lock

Fault TolerantCoordination ServicesBurrows, Mike. "The Chubby lock service for loosely-coupled distributed systems." Proceedings of the 7th symposium on Operating systems design and implementation. USENIX Association, 2006.

Hunt, Patrick, et al. "ZooKeeper: Wait-free Coordination for Internet-scale Systems." USENIX Annual Technical Conference. Vol. 8. 2010.

Dependable Distributed Applications | Dependable Systems 2014 7

Page 8: Dependable Distributed Applications - uni-potsdam.de · Chubby •Google’s distributed lock service •Goal: easily add consensus / leader election to existing application •Lock

Motivation

• Distributed algorithms are notoriously hard to implement correctly

• Leader election / consensus need to be inherently fault tolerant

• Decoupling algorithmic and data redundancy• Storage nodes usually need a higher degree of replication

• Consistency constraints

• High recovery costs

• Decision making should be lightweight• Fast recovery

• Low latency requirement

Dependable Distributed Applications | Dependable Systems 2014 8

In Search of an Understandable Consensus Algorithm

Page 9: Dependable Distributed Applications - uni-potsdam.de · Chubby •Google’s distributed lock service •Goal: easily add consensus / leader election to existing application •Lock

Chubby

• Google’s distributed lock service

• Goal: easily add consensus / leader election to existing application

• Lock service: simple interface for distributed decision making• “a generic electorate that allows a client system to make decisions correctly

when less than a majority of its own members are up”

• Serves small files so elected primaries can easily distribute parameters

• Client notification on events (such as lock expiry new leader election)

• Chubby servers contain 5 replicas, implementing Paxos• Automatic failover within a configured machine pool

Dependable Distributed Applications | Dependable Systems 2014 9

Page 10: Dependable Distributed Applications - uni-potsdam.de · Chubby •Google’s distributed lock service •Goal: easily add consensus / leader election to existing application •Lock

Zookeeper

• “Because Coordinating Distributed Systems is a Zoo”

• Distributed configuration + coordination service• Used for leader election, message queuing, synchronization

• Provides a file system like namespace for coordination data (<= 1MB per node)• Kept in memory• State based service: no change history

• Guaranteed absolute order of updates• Client watch events are triggered in the same order as Zookeeper sees the updates

• Throughput of read requests scales with #servers

• Throughput of write requests decreases with #servers• Consensus on all updates• ~50k updates per second

Dependable Distributed Applications | Dependable Systems 2014 10

Page 11: Dependable Distributed Applications - uni-potsdam.de · Chubby •Google’s distributed lock service •Goal: easily add consensus / leader election to existing application •Lock

Distributed StorageChang, Fay, et al. "Bigtable: A distributed storage system for structured data.“ ACM Transactions on Computer Systems (TOCS) 26.2 (2008): 4.

Corbett, James C., et al. "Spanner: Google’s globally distributed database.“ ACM Transactions on Computer Systems (TOCS) 31.3 (2013): 8.

HDFS architecture guide. http://hadoop. apache. org/common/docs/current/hdfs design. pdf (2008).

DeCandia, Giuseppe, et al. "Dynamo: amazon's highly available key-value store." ACM SIGOPS Operating Systems Review. Vol. 41. No. 6. ACM, 2007.

Dependable Distributed Applications | Dependable Systems 2014 11

Page 12: Dependable Distributed Applications - uni-potsdam.de · Chubby •Google’s distributed lock service •Goal: easily add consensus / leader election to existing application •Lock

Design Choices

• When to resolve conflicts?• On read

• On write

• Who resolves conflicts?• Application: data model aware resolution policies possible

• Storage system: application transparency, but less powerful

• ACID vs BASE

• PCAELC trade-offs

• Data partitioning algorithm

Dependable Distributed Applications | Dependable Systems 2014 12

Page 13: Dependable Distributed Applications - uni-potsdam.de · Chubby •Google’s distributed lock service •Goal: easily add consensus / leader election to existing application •Lock

ACID vs BASE (Brewer. PODC keynote. 2000)

Atomic, Consistent, Isolated, Durable

• Transactions

• Strong consistency

• Pessimistic/conservative replication

Basically Available, Soft-state, Eventual consistency

• Best Effort

• Weak consistency

• Optimistic replication

Dependable Distributed Applications | Dependable Systems 2014 13

Page 14: Dependable Distributed Applications - uni-potsdam.de · Chubby •Google’s distributed lock service •Goal: easily add consensus / leader election to existing application •Lock

Modern distributed storage systems

• Geo-replication• Latency issues• Consistency models need to take locality into account

• Shift towards tuneable, relaxed consistency models• Application-specific configuration• Fault tolerance increasingly also a DevOps problem

• Always available, low latency, partition tolerance, scalability (ALPS)• Availability before consistency• Most ALPS systems offer eventual consistency

• NoSQL movement• Relational DBMS are hard to make consistent and available• Denormalized data is easier to replicate

Dependable Distributed Applications | Dependable Systems 2014 14

Page 15: Dependable Distributed Applications - uni-potsdam.de · Chubby •Google’s distributed lock service •Goal: easily add consensus / leader election to existing application •Lock

Self-Healing

• How (and when) to handle diverging replicas with eventual consistency?

• Read repair• Quorum met, but not all replicas agreed inconsistency detected!

• Force the minority to update their copy

• Active Anti-Entropy (AAE)• Continuously running background process

• Difference detection using hash trees

Dependable Distributed Applications | Dependable Systems 2014 15

Page 16: Dependable Distributed Applications - uni-potsdam.de · Chubby •Google’s distributed lock service •Goal: easily add consensus / leader election to existing application •Lock

BigTable

• Google’s distributed database

• Designed to handle petabytes of distributed data

• Non-relational data model: “multi-dimensional sparse maps”

• GQL: subset of SQL

• Building Blocks• Google File System (GFS) for raw storage

• Chubby for master election

• Custom MapReduce implementation for writing data

Dependable Distributed Applications | Dependable Systems 2014 16

Page 17: Dependable Distributed Applications - uni-potsdam.de · Chubby •Google’s distributed lock service •Goal: easily add consensus / leader election to existing application •Lock

Google Spanner

• Spanservers consist of different data centres

• Data model: semi-relational

• “Externally consistent” transactions(linearizable consistency for R/W transactions)

• Timestamped transactions, using Paxos

Dependable Distributed Applications | Dependable Systems 2014 17

Effect of killing the Paxos leader

Page 18: Dependable Distributed Applications - uni-potsdam.de · Chubby •Google’s distributed lock service •Goal: easily add consensus / leader election to existing application •Lock

Google Spanner / TrueTime

• Instead of relying on NTP, data centres have own atomic clocks

• GPS-based time negotiation• Periodical consensus on time reliable, uncertain global clock

• Interval-based time (uncertainty representation)• The longer past the last synchronization point, the higher the uncertainty

Dependable Distributed Applications | Dependable Systems 2014 18

Page 19: Dependable Distributed Applications - uni-potsdam.de · Chubby •Google’s distributed lock service •Goal: easily add consensus / leader election to existing application •Lock

HDFS

Dependable Distributed Applications | Dependable Systems 2014 19

• Standard storage system behind Hadoop

• Replication of equal size file blocks on DataNodes

• Central coordinating NameNode• Maintains metadata: namespace tree, mapping of blocks to DataNodes• Metadata kept in memory• Monitors DataNodes by receiving heartbeats

• DataNode failure NameNode detects it, replicates on another node

• NameNode single point of failure (before 2.0.0)

High Availability HDFS

• HBase runs on top of HDFS: open source BigTable implementation

Page 20: Dependable Distributed Applications - uni-potsdam.de · Chubby •Google’s distributed lock service •Goal: easily add consensus / leader election to existing application •Lock

Dynamo

• Amazon’s distributed key-value store

• Designed for scalability and high availability

• Assumptions• Most operations do not span multiple data items

No need for fully relational DBMS

• Poor write availability is worse than inconsistency

• Always writeable• Conflict resolution upon reads

Dependable Distributed Applications | Dependable Systems 2014 21

Page 21: Dependable Distributed Applications - uni-potsdam.de · Chubby •Google’s distributed lock service •Goal: easily add consensus / leader election to existing application •Lock

Riak

• Distributed key-value store programmed in Erlang

• Designed based on Dynamo paper

Dependable Distributed Applications | Dependable Systems 2014 22

Replication configuration Ring-based consistent hashing Erlang supervision tree

Page 22: Dependable Distributed Applications - uni-potsdam.de · Chubby •Google’s distributed lock service •Goal: easily add consensus / leader election to existing application •Lock

Cassandra

• Distributed NoSQL DBMS

• Designed for performance and scalability

• Eventual consistency configurable• Hinted handoff for availability

• Gossip protocol for failure detection

• Configurable replication + partitioning• NetworkTopologyStrategy:

data-centre aware

Dependable Distributed Applications | Dependable Systems 2014 23

http://www.ecyrd.com/cassandracalculator/

Page 23: Dependable Distributed Applications - uni-potsdam.de · Chubby •Google’s distributed lock service •Goal: easily add consensus / leader election to existing application •Lock

The Reality of Distributed Failures…

human operation mistakes data corruption is rarely part of the failure model

unforeseen (hence unmodelled) error propagation chains

dynamically changing failure probabilities

nested failures during recovery routines

Dependable Distributed Applications | Dependable Systems 2014 24