Dependable Distributed Applications - uni-potsdam.de · Chubby •Google’s distributed lock...

Dependable Distributed Applications

Dependable Systems 2014

Lena Herscheid, Dr. Peter Tröger

Dependable Distributed Applications | Dependable Systems 2014 1

Frameworks +Programming ModelsHanmer, Robert. Patterns for fault tolerant software. John Wiley & Sons, 2013.

Introduction to Fault Tolerant CORBA. http://cnb.ociweb.com/cnb/CORBANewsBrief-200301.html

Erlang/OTP http://www.erlang.org/doc/


FT-CORBA

• Extension of CORBA standard by commonly used fault tolerance patterns

• Fault model: node crash faults

• Replication• Object level, ReplicationManager + ReplicaFactory• Logical singletons: group of replica object group, appear as a single object• warm / cold passive high recovery time• active / active_with_votinghigh multicast time

• Fault detection• FaultDetector + FaultNotifier• Are assumed inherently fault tolerant

• Failure recovery• Apply log of updated to replica, depending on replica type

• Implementations• Replication in the ORB: Electra, TAO, Orbix+Isis, …• Replication through CORBA objects: DOORS, AQuA, OGS, …


Erlang/OTP

• Erlang programming language: fault tolerance as design principle• Isolated lightweight processes (managed by the VM)

• Programming model: asynchronous message passing

• “Let it crash” policy• Processes terminate with error codes

• Monitoring processes are expected to do recovery

• Transparent distribution of processes (by VM)

• Open Telecom Platform framework• Common patterns in concurrent distributed Erlang programs

• Modules can instantiate behaviours (server, fsm, supervisor…)


Erlang/OTP Example supervision tree:

one_for_one restart:


Fault TolerantCoordination ServicesBurrows, Mike. "The Chubby lock service for loosely-coupled distributed systems." Proceedings of the 7th symposium on Operating systems design and implementation. USENIX Association, 2006.

Hunt, Patrick, et al. "ZooKeeper: Wait-free Coordination for Internet-scale Systems." USENIX Annual Technical Conference. Vol. 8. 2010.


Motivation

• Distributed algorithms are notoriously hard to implement correctly

• Leader election / consensus need to be inherently fault tolerant

• Decoupling algorithmic and data redundancy• Storage nodes usually need a higher degree of replication

• Consistency constraints

• High recovery costs

• Decision making should be lightweight• Fast recovery

• Low latency requirement


In Search of an Understandable Consensus Algorithm

Chubby

• Google’s distributed lock service

• Goal: easily add consensus / leader election to existing application

• Lock service: simple interface for distributed decision making• “a generic electorate that allows a client system to make decisions correctly

when less than a majority of its own members are up”

• Serves small files so elected primaries can easily distribute parameters

• Client notification on events (such as lock expiry new leader election)

• Chubby servers contain 5 replicas, implementing Paxos• Automatic failover within a configured machine pool


Zookeeper

• “Because Coordinating Distributed Systems is a Zoo”

• Distributed configuration + coordination service• Used for leader election, message queuing, synchronization

• Provides a file system like namespace for coordination data (<= 1MB per node)• Kept in memory• State based service: no change history

• Guaranteed absolute order of updates• Client watch events are triggered in the same order as Zookeeper sees the updates

• Throughput of read requests scales with #servers

• Throughput of write requests decreases with #servers• Consensus on all updates• ~50k updates per second


Distributed StorageChang, Fay, et al. "Bigtable: A distributed storage system for structured data.“ ACM Transactions on Computer Systems (TOCS) 26.2 (2008): 4.

Corbett, James C., et al. "Spanner: Google’s globally distributed database.“ ACM Transactions on Computer Systems (TOCS) 31.3 (2013): 8.

HDFS architecture guide. http://hadoop. apache. org/common/docs/current/hdfs design. pdf (2008).

DeCandia, Giuseppe, et al. "Dynamo: amazon's highly available key-value store." ACM SIGOPS Operating Systems Review. Vol. 41. No. 6. ACM, 2007.


Design Choices

• When to resolve conflicts?• On read

• On write

• Who resolves conflicts?• Application: data model aware resolution policies possible

• Storage system: application transparency, but less powerful

• ACID vs BASE

• PCAELC trade-offs

• Data partitioning algorithm


ACID vs BASE (Brewer. PODC keynote. 2000)

Atomic, Consistent, Isolated, Durable

• Transactions

• Strong consistency

• Pessimistic/conservative replication

Basically Available, Soft-state, Eventual consistency

• Best Effort

• Weak consistency

• Optimistic replication


Modern distributed storage systems

• Geo-replication• Latency issues• Consistency models need to take locality into account

• Shift towards tuneable, relaxed consistency models• Application-specific configuration• Fault tolerance increasingly also a DevOps problem

• Always available, low latency, partition tolerance, scalability (ALPS)• Availability before consistency• Most ALPS systems offer eventual consistency

• NoSQL movement• Relational DBMS are hard to make consistent and available• Denormalized data is easier to replicate


Self-Healing

• How (and when) to handle diverging replicas with eventual consistency?

• Read repair• Quorum met, but not all replicas agreed inconsistency detected!

• Force the minority to update their copy

• Active Anti-Entropy (AAE)• Continuously running background process

• Difference detection using hash trees


BigTable

• Google’s distributed database

• Designed to handle petabytes of distributed data

• Non-relational data model: “multi-dimensional sparse maps”

• GQL: subset of SQL

• Building Blocks• Google File System (GFS) for raw storage

• Chubby for master election

• Custom MapReduce implementation for writing data


Google Spanner

• Spanservers consist of different data centres

• Data model: semi-relational

• “Externally consistent” transactions(linearizable consistency for R/W transactions)

• Timestamped transactions, using Paxos


Effect of killing the Paxos leader

Google Spanner / TrueTime

• Instead of relying on NTP, data centres have own atomic clocks

• GPS-based time negotiation• Periodical consensus on time reliable, uncertain global clock

• Interval-based time (uncertainty representation)• The longer past the last synchronization point, the higher the uncertainty


HDFS


• Standard storage system behind Hadoop

• Replication of equal size file blocks on DataNodes

• Central coordinating NameNode• Maintains metadata: namespace tree, mapping of blocks to DataNodes• Metadata kept in memory• Monitors DataNodes by receiving heartbeats

• DataNode failure NameNode detects it, replicates on another node

• NameNode single point of failure (before 2.0.0)

High Availability HDFS

• HBase runs on top of HDFS: open source BigTable implementation

Dynamo

• Amazon’s distributed key-value store

• Designed for scalability and high availability

• Assumptions• Most operations do not span multiple data items

No need for fully relational DBMS

• Poor write availability is worse than inconsistency

• Always writeable• Conflict resolution upon reads


Riak

• Distributed key-value store programmed in Erlang

• Designed based on Dynamo paper


Replication configuration Ring-based consistent hashing Erlang supervision tree

Cassandra

• Distributed NoSQL DBMS

• Designed for performance and scalability

• Eventual consistency configurable• Hinted handoff for availability

• Gossip protocol for failure detection

• Configurable replication + partitioning• NetworkTopologyStrategy:

data-centre aware


http://www.ecyrd.com/cassandracalculator/

The Reality of Distributed Failures…

human operation mistakes data corruption is rarely part of the failure model

unforeseen (hence unmodelled) error propagation chains

dynamically changing failure probabilities

nested failures during recovery routines


Dependable Distributed Applications - uni-potsdam.de · Chubby •Google’s distributed lock...

Documents

Transcript of Dependable Distributed Applications - uni-potsdam.de · Chubby •Google’s distributed lock...