Programming Distributed SystemsProgramming Models for Distributed Systems
Annette Bieniusa
FB InformatikTU Kaiserslautern
Annette Bieniusa Programming Distributed Systems 1/ 26
What is a Programming Model? [3]
A programming model is some form of abstract machineProvides operations to the level aboveRequires implementations for these operations on the level(s) below
Simplification through abstractionStandard interface that remains stable even if underlyingarchitecture changesProvide different levels of abstractionOften starting point for language development
⇒ Separation of concern between software developers and frameworkimplementors (runtime system, compiler, etc.)
Annette Bieniusa Programming Distributed Systems 2/ 26
Properties of good programming models
Meaningful abstractionsSystem-architecture independentEfficiently implementableEasy to understand
Annette Bieniusa Programming Distributed Systems 3/ 26
What kind of abstractions should a programmingmodel for distributed systems provide?
Annette Bieniusa Programming Distributed Systems 4/ 26
Remote Procedure Call
Annette Bieniusa Programming Distributed Systems 5/ 26
Remote Procedure Call (RPC) [2]
Rather broad classifying term with changing meaning over timeFrom client-server design to interconnected services
Two entities (caller/callee) with different address spacescommunicate over some channel in a request-response mechanismExamples: CORBA (Common Object Request BrokerArchitecture), Java RMI (Remote Method Invocation), SOAP(Simple Object Access Protocol), gRPC (Protocol Buffers),Twitter Finagle . . .
Annette Bieniusa Programming Distributed Systems 6/ 26
Annette Bieniusa Programming Distributed Systems 7/ 26
Flaws of RPC
Location transparency (i.e. request to remote service looks likelocal function call) masks the potential of distribution-relatedfailuresRPCs might timeout, requires usually special handling such asretryingLocal functions do not need to deal with the problem ofidempotenceExecution time is unpredictablePassing of objects is complex (e.g. might need to serializereferenced objects)Translating data types between languages might rely onsemantical approximation
Annette Bieniusa Programming Distributed Systems 8/ 26
Aspects of modern RPCLanguage-agnosticSerialization (aka marshalling or pickling)
JSON, XML, Protocol Buffers, . . .Load-balancing
SOA (Service-oriented architecture) ⇒ Microservice architectures!Asynchronous
⇒ RPC as term gets more and more diffuse
Annette Bieniusa Programming Distributed Systems 9/ 26
Futures and Promises
“Asynchronous RPC”A future is a value that will eventually become availableTwo states:
completed: value is availableincomplete: computation for value is not yet complete
Strategies: Eager vs. lazy evaluationTypical application: Web development and user interfaces
Annette Bieniusa Programming Distributed Systems 10/ 26
Exampleinterface ArchiveSearcher { String search(String target); }
class App {ExecutorService executor = ...ArchiveSearcher searcher = ...void showSearch(final String target)
throws InterruptedException {Future<String> future= executor.submit(new Callable<String>() {
public String call() {return searcher.search(target);
}});displayOtherThings(); // do other things while searchingtry {displayText(future.get()); // use future
} catch (ExecutionException ex) { cleanup(); return; }}
}
From Oracle’s Java Documentation
Annette Bieniusa Programming Distributed Systems 11/ 26
Actors and Message Passing
Annette Bieniusa Programming Distributed Systems 12/ 26
Characteristics of Actor Model [Hewitt]
Actors are isolated units of computation + state that can sendmessages asynchronously to each otherMessages are queued in mailbox and processed sequentially whenthey match against some pattern/ruleNo assumptions on message delivery guarantees(Potential) State + behavior changes upon message processing[1]Very close to Alan Kay’s definition of Object-OrientedProgramming
Annette Bieniusa Programming Distributed Systems 13/ 26
Actors in the WildErlang
Process-basedPure message passingmonitor and link for notification of process failure/shutdownOTP (Open Telecom Platform) for generic reusable patterns
AkkaActor model for the JVMPurges non-matching messagesEnforces parental supervisionIncluded in Scala standard library
OrleansActors for Cloud computingScalability by replicationFine-grain reconciliation of state with transactions
Annette Bieniusa Programming Distributed Systems 14/ 26
Message brokers
Message-oriented middleware which stores messages temporarilyand forwards them to registered recipientsPatterns: Publish-subscribe, point-to-pointActs as buffer for unavailable and overloaded recipientsDecoupling of sender and receiver(s)Efficient 1-to-n multicastAdvanced Message Queuing Protocol (AMQP) standardizesqueuing, routing, reliability and securityDelivery guarantees (at-most-once, at-least-once, exactly-once)
Annette Bieniusa Programming Distributed Systems 15/ 26
Example: RabbitMQSupports (amongst others) publish-subscribe patternTypical usage: Topics as routing keys
Q1 is interested in all the orange animalsQ2 wants to hear everything about rabbits, and everything aboutlazy animalsMessages that don’t map any binding get lostMessages are maintained in the queue in publication order
Annette Bieniusa Programming Distributed Systems 16/ 26
Stream processing
(Infinite) Sequence of data that is incrementally made availableExample: Sensor data, audio / video delivery, filesystem APIs, etc.Producers vs. ConsumersNotions of window and time: Consumers will receive onlymessages after subscribingHere: Event stream where data item is atypically associated withtimestamp
Annette Bieniusa Programming Distributed Systems 17/ 26
Classification of stream processing systems
1 What happens if producer sends messages faster than theconsumer can handle?
Drop messagesBuffer messagesApply backpressure (i.e. prevent producer from sending more)
2 What happens if nodes become unreachable?Loose messagesUse replication and persistence to preserve non-acknowledgedmessages
Annette Bieniusa Programming Distributed Systems 18/ 26
Log-based message brokers
Example: Kafka [https://kafka.apache.org]Message buffers are typically transient: Once the message isdelivered, the message is deletedIdea: Combine durable storage with low-latency notification!
Annette Bieniusa Programming Distributed Systems 19/ 26
Scalability and fault-tolerance for replicated logsFor scalability, partitioning of log on different machinesFor fault-tolerance, replication on different machines
Need to ensure same ordering on all replicas (⇒ Total-orderbroadcast)Can easily add consumers for debugging, testing, etc.Ideas: Event-sourcing, immutability and audits
Annette Bieniusa Programming Distributed Systems 20/ 26
Batch-processing
Static data sets that has known/finite sizeNeed to artificially batch data into by day, month, minute, . . .Typically large latencies
Annette Bieniusa Programming Distributed Systems 21/ 26
The Future: Distributed Programming Languages
Annette Bieniusa Programming Distributed Systems 22/ 26
From Model to Language
Challenges: Partial failure, concurrency and consistency, latency,. . .
1 Distributed Shared MemoryRuntime maps virtual addresses to physical ones“Single-system” illusion
2 ActorsExplicit communicationLocation of processes is transparent
3 DataflowData transformations expressed as DAGProcesses are transparentExample: MapReduce (Google), Dryad (Microsoft), Spark
Annette Bieniusa Programming Distributed Systems 23/ 26
Example: WordCount in MapReduce
Annette Bieniusa Programming Distributed Systems 24/ 26
Further reading
Material collection by Northeastern University, CS7680 SpecialTopics in Computing Systems: Programming Models forDistributed Computing
Annette Bieniusa Programming Distributed Systems 25/ 26
Further reading I
[1] Gul Agha. ”Concurrent Object-Oriented Programming“. In:Commun. ACM 33.9 (1990), S. 125–141. doi:10.1145/83880.84528. url:http://doi.acm.org/10.1145/83880.84528.
[2] Andrew Birrell und Bruce Jay Nelson. ”Implementing RemoteProcedure Calls“. In: ACM Trans. Comput. Syst. 2.1 (1984),S. 39–59. url: https://doi.org/10.1145/2080.357392.
[3] David B. Skillicorn und Domenico Talia. ”Models and Languagesfor Parallel Computation“. In: ACM Comput. Surv. 30.2 (1998),S. 123–169. doi: 10.1145/280277.280278. url:http://doi.acm.org/10.1145/280277.280278.
Annette Bieniusa Programming Distributed Systems 26/ 26
Top Related