Lecture 2 Introduction to Principles of Distributed Computing
description
Transcript of Lecture 2 Introduction to Principles of Distributed Computing
![Page 1: Lecture 2 Introduction to Principles of Distributed Computing](https://reader036.fdocuments.net/reader036/viewer/2022062803/56814748550346895db48629/html5/thumbnails/1.jpg)
Sergio Rajsbaum 2006
Lecture 2Introduction to Principles of
Distributed Computing
Sergio RajsbaumMath Institute
UNAM, Mexico
![Page 2: Lecture 2 Introduction to Principles of Distributed Computing](https://reader036.fdocuments.net/reader036/viewer/2022062803/56814748550346895db48629/html5/thumbnails/2.jpg)
Sergio Rajsbaum 2006
Lecture 2
• Part I: Refresh from Lecture I. What is a distributed system and its parameters. Problems solved in such a system. The need for a theoretical foundation. Two-phase commit
• Part II: Coordinated attack, consensus
![Page 3: Lecture 2 Introduction to Principles of Distributed Computing](https://reader036.fdocuments.net/reader036/viewer/2022062803/56814748550346895db48629/html5/thumbnails/3.jpg)
Sergio Rajsbaum 2006
Part I: What is a distributed system
The need for a theoretical foundation. Two-phase commit
![Page 4: Lecture 2 Introduction to Principles of Distributed Computing](https://reader036.fdocuments.net/reader036/viewer/2022062803/56814748550346895db48629/html5/thumbnails/4.jpg)
Sergio Rajsbaum 2006
Principles of Distributed Computing
• Distributed computing studies systems where components interact and collaborate
• Principles of distributed computing tries to understand the fundamental possibilities and limitations of such systems, with a precise, scientific approach
• Goal: to design efficient and reliable systems, and techniques to design them, analyze them and prove them correct, or to prove impossibility results when no protocol exists
![Page 5: Lecture 2 Introduction to Principles of Distributed Computing](https://reader036.fdocuments.net/reader036/viewer/2022062803/56814748550346895db48629/html5/thumbnails/5.jpg)
Sergio Rajsbaum 2006
What is distributed computing?
• Any system where several independent computing components interact
• This broad definition encompasses– VLSI chips, and any modern PC
– tightly-coupled shared memory multiprocessor
– local area cluster of workstations
– internet, WEB, Web services
– wireless networks, sensor networks, ad-hoc networks
– cooperating robots, mobile agents, P2P systems
![Page 6: Lecture 2 Introduction to Principles of Distributed Computing](https://reader036.fdocuments.net/reader036/viewer/2022062803/56814748550346895db48629/html5/thumbnails/6.jpg)
Sergio Rajsbaum 2006
Computing components
• Referred to processors or processes in the literature
• Can represent a– microprocessor – process in a multiprocessing operating system– Java thread– mobile agent, mobile node (e.g. laptop), robot– computing element in a VLSI chip
![Page 7: Lecture 2 Introduction to Principles of Distributed Computing](https://reader036.fdocuments.net/reader036/viewer/2022062803/56814748550346895db48629/html5/thumbnails/7.jpg)
Sergio Rajsbaum 2006
Interaction – message passing vs. shared memory
• Processors need to communicate with each other to collaborate, via
• Message passing– Point-to-point channels, defining an interconnection
graph– All-to-all using an underlying infrastructure (e.g.
TCP/IP)– Broadcast; wireless, satellite
• Shared memory– Shared-objects: read/write, test&set, compare&swap, etc– Usually harder to implement, easier to program
![Page 8: Lecture 2 Introduction to Principles of Distributed Computing](https://reader036.fdocuments.net/reader036/viewer/2022062803/56814748550346895db48629/html5/thumbnails/8.jpg)
Sergio Rajsbaum 2006
A distributed system
processors
Communicationmedia
collaborate
![Page 9: Lecture 2 Introduction to Principles of Distributed Computing](https://reader036.fdocuments.net/reader036/viewer/2022062803/56814748550346895db48629/html5/thumbnails/9.jpg)
Sergio Rajsbaum 2006
Failures
• Any system that includes many components running over a long period of time must consider the possibility of failures
• of processors and communication media
• of different severity– from processor crashes or message loses, to– malicious Byzantine behavior
![Page 10: Lecture 2 Introduction to Principles of Distributed Computing](https://reader036.fdocuments.net/reader036/viewer/2022062803/56814748550346895db48629/html5/thumbnails/10.jpg)
Sergio Rajsbaum 2006
Many kinds of problems
• Clock synchronization• Routing• Broadcasting• Naming• P2P, how to share and find resources• sharing resources, mutual exclusion• Increasing fault-tolerance, failure detection• Security, authentication, cryptography• Database transactions, atomic commitment• Backups, reliable storage, file systems• Applications, airline reservation, banking, electronic
commerce, publish/subscribe systems, web search, web caching, …
![Page 11: Lecture 2 Introduction to Principles of Distributed Computing](https://reader036.fdocuments.net/reader036/viewer/2022062803/56814748550346895db48629/html5/thumbnails/11.jpg)
Sergio Rajsbaum 2006
Multi-layered, complex interactionsAn example
• A fault-tolerant broadcast service is useful to build a higher level database transaction module
• Naming, authentication is required• And may work more efficiently if clocks are tightly
synchronized• And good routing schemes should exist• If the clock synchronization is attacked, the whole
system may be compromised
![Page 12: Lecture 2 Introduction to Principles of Distributed Computing](https://reader036.fdocuments.net/reader036/viewer/2022062803/56814748550346895db48629/html5/thumbnails/12.jpg)
Sergio Rajsbaum 2006
Chaos
We need a good foundation,
principles of distributed computing
![Page 13: Lecture 2 Introduction to Principles of Distributed Computing](https://reader036.fdocuments.net/reader036/viewer/2022062803/56814748550346895db48629/html5/thumbnails/13.jpg)
Sergio Rajsbaum 2006
Chaos
• Too many models, problems and orthogonal, interacting issues
• Very hard to get things right, to reproduce operating scenarios
• Sometimes it is easy to adapt a solution to a different model, sometimes a small change in the model makes a problem unsolvable
![Page 14: Lecture 2 Introduction to Principles of Distributed Computing](https://reader036.fdocuments.net/reader036/viewer/2022062803/56814748550346895db48629/html5/thumbnails/14.jpg)
Sergio Rajsbaum 2006
Distributed computing theory• Models
– Good models [Schneider Ch.2 in Distributed Systems, Mullender (Ed.)]
– Relation between models: solve a problem only once; solve it in the strongest possible model
• Problems– Search of paradigms that represent fundamental distributed
computing issues– Relations between problems: hierarchies of solvable and unsolvable
problems; reductions• Solutions
– Design algorithms, verification techniques, programming abstractions
– Impossibility results and lower bounds• Efficiency measures
– Time, communication, failures, recovery time, bottlenecks, congestion
![Page 15: Lecture 2 Introduction to Principles of Distributed Computing](https://reader036.fdocuments.net/reader036/viewer/2022062803/56814748550346895db48629/html5/thumbnails/15.jpg)
Sergio Rajsbaum 2006
Distributed Commit
An example of a distributed protocol
Fundamental part of distributed DBMS
![Page 16: Lecture 2 Introduction to Principles of Distributed Computing](https://reader036.fdocuments.net/reader036/viewer/2022062803/56814748550346895db48629/html5/thumbnails/16.jpg)
Sergio Rajsbaum 2006
Distributed Commit
• A distributed transaction with components at several sites should execute atomically
• Example: A manager of a chain of stores wants to query all the stores, find the inventory of toothbrushes at each, and issue instructions to move toothbrushes from store to store in order to balance the inventory.
• The operation is done by a single global transaction T that has component Ti at the i-th store and a component T0 at the office where the manages is located.
![Page 17: Lecture 2 Introduction to Principles of Distributed Computing](https://reader036.fdocuments.net/reader036/viewer/2022062803/56814748550346895db48629/html5/thumbnails/17.jpg)
Sergio Rajsbaum 2006
Sequence of activities performed by T
1. Component T0 is created at the site of the manager2. T0 sends messages to all the stores instructing them to
create components Ti3. Each Ti executes a query at store I to discover the number
of toothbrushes in inventory and reports this number to T04. T0 takes these numbers and determines, by some algorithm
we shall not discuss, what shipments of toothbrushes are desired. T0 then sends messages such as “store 10 should ship 500 toothbrushes to store 7” to the appropriate stores
5. Stores receiving instructions update their inventory and perform the shipments
![Page 18: Lecture 2 Introduction to Principles of Distributed Computing](https://reader036.fdocuments.net/reader036/viewer/2022062803/56814748550346895db48629/html5/thumbnails/18.jpg)
Sergio Rajsbaum 2006
Atomicity
• Make sure it does not happen: some of the actions of T get executed, but others do not
• We do assume atomicity of each Ti, through mechanisms such as logging and recovery
• Failures make difficult the achievement of atomicity of T– A site fails or is disconnected from the network
– A bug in the algorithm to redistribute toothbrushes instructs store 10 to ship more than it has
![Page 19: Lecture 2 Introduction to Principles of Distributed Computing](https://reader036.fdocuments.net/reader036/viewer/2022062803/56814748550346895db48629/html5/thumbnails/19.jpg)
Sergio Rajsbaum 2006
Example of failures
• Suppose T10 replies to T0’s 1st message with its inventory.
• The machine at 10 then crashes, the instructions form T0 are never received by T10
• However, T7 sees no problem, and receives the instructions from T0
• Can distributed transaction T ever commit?
![Page 20: Lecture 2 Introduction to Principles of Distributed Computing](https://reader036.fdocuments.net/reader036/viewer/2022062803/56814748550346895db48629/html5/thumbnails/20.jpg)
Sergio Rajsbaum 2006
Agreement Paradigms
Coordinated attack
Consensus
![Page 21: Lecture 2 Introduction to Principles of Distributed Computing](https://reader036.fdocuments.net/reader036/viewer/2022062803/56814748550346895db48629/html5/thumbnails/21.jpg)
Sergio Rajsbaum 2006
Coordinated AttackAn important abstraction
• a pair of allied generals A and B have agreed to attack simultaneously or not at all.
• they can only communicate via carrier pigeon; message loss is possible
A B
![Page 22: Lecture 2 Introduction to Principles of Distributed Computing](https://reader036.fdocuments.net/reader036/viewer/2022062803/56814748550346895db48629/html5/thumbnails/22.jpg)
Sergio Rajsbaum 2006
Difficulty: uncertainty
• Suppose general A sends the message to B “attack at dawn”
• general A won’t attack alone. A doesn’t know whether B has received the message. B understand A’s predicament, so B sends an acknowledgment “agreed”
![Page 23: Lecture 2 Introduction to Principles of Distributed Computing](https://reader036.fdocuments.net/reader036/viewer/2022062803/56814748550346895db48629/html5/thumbnails/23.jpg)
Sergio Rajsbaum 2006
Impossible
Theorem: Assume that communication is unreliable. Any protocol that guarantees that if one of the generals attacks, then the other does so at the same time, is a protocol in which necessarily neither general attacks.
A B
“attack at dawn”
Did B get it?
BA
“ack”
Did A get it?
![Page 24: Lecture 2 Introduction to Principles of Distributed Computing](https://reader036.fdocuments.net/reader036/viewer/2022062803/56814748550346895db48629/html5/thumbnails/24.jpg)
Sergio Rajsbaum 2006
It never ends
• There is always uncertainty of weather the last message was delivered or not
• Corollary: If decision must be made within a fixed time period, then unreliable communication prevents database commitment protocols
A B
“ack your ack”
Did B get it?
BA
“ack your ack to my ack”
Did A get it?
![Page 25: Lecture 2 Introduction to Principles of Distributed Computing](https://reader036.fdocuments.net/reader036/viewer/2022062803/56814748550346895db48629/html5/thumbnails/25.jpg)
Sergio Rajsbaum 2006
Agreement Problems in Distributed Computing are common
Because processes have different views of its state and history
![Page 26: Lecture 2 Introduction to Principles of Distributed Computing](https://reader036.fdocuments.net/reader036/viewer/2022062803/56814748550346895db48629/html5/thumbnails/26.jpg)
Sergio Rajsbaum 2006
Agreement Problems in Distributed Computing are common…
Because processes have different views of its state and history, due to:
• Delays• Failures
NASA plunged the Galileo spacecraft into Jupiter’s turbulent atmosphere today. The unmanned spacecraft dived into the atmosphere at 2:57 p.m. Eastern time. The last of Galileo’s data arrived on Earth today after the spacecraft was destroyed, taking 52 minutes to cross half a billion miles of space
The New York Times, 21 Sept. 2003
![Page 27: Lecture 2 Introduction to Principles of Distributed Computing](https://reader036.fdocuments.net/reader036/viewer/2022062803/56814748550346895db48629/html5/thumbnails/27.jpg)
Sergio Rajsbaum 2006
… and Agreement Problems are Important
• In a replicated data system: to execute the same sequence of operations on the replicated data
• In a replicated sensor system: to agree on the values of the sensors
• In a timed system: to synchronize a set of clocks• In a broadcast system: to deliver the same messages
in the same order• In a database system: to commit or abort a
transactionEtc….
![Page 28: Lecture 2 Introduction to Principles of Distributed Computing](https://reader036.fdocuments.net/reader036/viewer/2022062803/56814748550346895db48629/html5/thumbnails/28.jpg)
Sergio Rajsbaum 2006
Consensus
The king of agreement problems
![Page 29: Lecture 2 Introduction to Principles of Distributed Computing](https://reader036.fdocuments.net/reader036/viewer/2022062803/56814748550346895db48629/html5/thumbnails/29.jpg)
Sergio Rajsbaum 2006
CONSENSUS A fundamental Abstraction
Each process has an input, should decide an output s.t.
Agreement: correct processes’ decisions are the same
Validity: decision is input of one process
Termination: eventually all correct processes decide
There are at least two possible input values 0 and 1
![Page 30: Lecture 2 Introduction to Principles of Distributed Computing](https://reader036.fdocuments.net/reader036/viewer/2022062803/56814748550346895db48629/html5/thumbnails/30.jpg)
Sergio Rajsbaum 2006
A Solution to Consensus For a group of people sitting in a room
![Page 31: Lecture 2 Introduction to Principles of Distributed Computing](https://reader036.fdocuments.net/reader036/viewer/2022062803/56814748550346895db48629/html5/thumbnails/31.jpg)
Sergio Rajsbaum 2006
A Solution to ConsensusEach one raises a card with its input
2
00
1
0
![Page 32: Lecture 2 Introduction to Principles of Distributed Computing](https://reader036.fdocuments.net/reader036/viewer/2022062803/56814748550346895db48629/html5/thumbnails/32.jpg)
Sergio Rajsbaum 2006
A Solution to Consensus Follow a coordinator
2
00
1
0 1
1
11
1
![Page 33: Lecture 2 Introduction to Principles of Distributed Computing](https://reader036.fdocuments.net/reader036/viewer/2022062803/56814748550346895db48629/html5/thumbnails/33.jpg)
Sergio Rajsbaum 2006
A Solution to Consensus Majority wins (breaking ties with the largest)
2
00
1
0 0
0
00
0
![Page 34: Lecture 2 Introduction to Principles of Distributed Computing](https://reader036.fdocuments.net/reader036/viewer/2022062803/56814748550346895db48629/html5/thumbnails/34.jpg)
Sergio Rajsbaum 2006
A Solution to ConsensusFailures are no problem (choose another
coordinator, or majority of non-failed)
2
0%!#
1
0
![Page 35: Lecture 2 Introduction to Principles of Distributed Computing](https://reader036.fdocuments.net/reader036/viewer/2022062803/56814748550346895db48629/html5/thumbnails/35.jpg)
Sergio Rajsbaum 2006
A Solution to Consensus… because this cannot happen!!
2
0
%!#
1
0
1
![Page 36: Lecture 2 Introduction to Principles of Distributed Computing](https://reader036.fdocuments.net/reader036/viewer/2022062803/56814748550346895db48629/html5/thumbnails/36.jpg)
Sergio Rajsbaum 2006
Consensus in Distributed SystemsThis can happen: delays
1
?
?
?
![Page 37: Lecture 2 Introduction to Principles of Distributed Computing](https://reader036.fdocuments.net/reader036/viewer/2022062803/56814748550346895db48629/html5/thumbnails/37.jpg)
Sergio Rajsbaum 2006
Consensus in Distributed Systems and then there are different views
2
0
1
01020
1
1020?
1020?
1020?
†
![Page 38: Lecture 2 Introduction to Principles of Distributed Computing](https://reader036.fdocuments.net/reader036/viewer/2022062803/56814748550346895db48629/html5/thumbnails/38.jpg)
Sergio Rajsbaum 2006
Consensus in Distributed Systems so we try to reconcile views- another round
2
0
1
01020
1
1020?
1020?
1020?
†
10201
![Page 39: Lecture 2 Introduction to Principles of Distributed Computing](https://reader036.fdocuments.net/reader036/viewer/2022062803/56814748550346895db48629/html5/thumbnails/39.jpg)
Sergio Rajsbaum 2006
Consensus in Distributed Systems but we could have the same problem!!
2
0
1
01020
1
1020?
1020?
1020?
†
10201
10201
![Page 40: Lecture 2 Introduction to Principles of Distributed Computing](https://reader036.fdocuments.net/reader036/viewer/2022062803/56814748550346895db48629/html5/thumbnails/40.jpg)
Sergio Rajsbaum 2006
So, is consensus solvable?If so, how long does it take to solve it?
• It depends on what exactly the model is• But what is a realistic model?• And what are the common scenarios within the
model? The nature of a distributed system is to include complex combinations of failures and delays
![Page 41: Lecture 2 Introduction to Principles of Distributed Computing](https://reader036.fdocuments.net/reader036/viewer/2022062803/56814748550346895db48629/html5/thumbnails/41.jpg)
Sergio Rajsbaum 2006
Basic Model – asynchronous crash failure model
• Message passing (another option would be a shared memory model)
• Channels between every pair of processes
• Crash failures, with a bound tt < n potential failures out of n >1 processes
• No message loss among correct processes
• Unbounded message delays, unpredictable processor’s speeds
![Page 42: Lecture 2 Introduction to Principles of Distributed Computing](https://reader036.fdocuments.net/reader036/viewer/2022062803/56814748550346895db48629/html5/thumbnails/42.jpg)
Sergio Rajsbaum 2006
Distributed algorithms(protocols)
• A set of algorithms, each one runs on a different processor (or as a thread in the same computer)
• The code includes instructions to communicate with other processors: – Send (M) to p– Upon receiving a message form q do
![Page 43: Lecture 2 Introduction to Principles of Distributed Computing](https://reader036.fdocuments.net/reader036/viewer/2022062803/56814748550346895db48629/html5/thumbnails/43.jpg)
Sergio Rajsbaum 2006
A consensus protocol1. val input2. send val to all3. wait until at least n - t messages have been
received4. let V[j] be the val received from process j else ‘-’ 5. return h (V) = largest value in V
- This same code is executed by every process - each one receives the value input from some
application- h is a predefined function, that all processors know
![Page 44: Lecture 2 Introduction to Principles of Distributed Computing](https://reader036.fdocuments.net/reader036/viewer/2022062803/56814748550346895db48629/html5/thumbnails/44.jpg)
Sergio Rajsbaum 2006
Is this protocol correct ?
• It depends on what is the set C of possible inputs
• An input to the protocol is a vector I, where I[j] contains the local input of the j-th process
• The local input of pj is known only to pj
• And is taken from some universe of possible values V not including ‘-’
• Let C be the set of possible input vectors to the protocol
![Page 45: Lecture 2 Introduction to Principles of Distributed Computing](https://reader036.fdocuments.net/reader036/viewer/2022062803/56814748550346895db48629/html5/thumbnails/45.jpg)
Sergio Rajsbaum 2006
Exercise 11. Define a set C as large as possible for which the
protocol is correct2. Prove that the protocol is correct for this C3. Do you need to assume t < n / 2 ?
Namely, that for every I in C, in every execution with input I where at most t processes crash, the consensus requirements are satisfied
Termination: eventually all correct processes decideAgreement: correct processes’ decisions are the sameValidity: decision is input of one process
![Page 46: Lecture 2 Introduction to Principles of Distributed Computing](https://reader036.fdocuments.net/reader036/viewer/2022062803/56814748550346895db48629/html5/thumbnails/46.jpg)
Sergio Rajsbaum 2006
Exercise 2
The protocol uses h (V) = largest value in V
1. Define another such function h’
2. Repeat the previous exercise with respect to your h’
![Page 47: Lecture 2 Introduction to Principles of Distributed Computing](https://reader036.fdocuments.net/reader036/viewer/2022062803/56814748550346895db48629/html5/thumbnails/47.jpg)
Sergio Rajsbaum 2006
Exercise 3
Consider the set C that includes every possible input vector formed with values from V, where | V | is at least 2
1. Is there a function h for which the protocol is correct ?
If so, give one such h and prove the protocol is correct, otherwise, give a brief intuitive argument of why there is no such h
![Page 48: Lecture 2 Introduction to Principles of Distributed Computing](https://reader036.fdocuments.net/reader036/viewer/2022062803/56814748550346895db48629/html5/thumbnails/48.jpg)
Sergio Rajsbaum 2006
BibliographyTheory of distributed computing textbooks
• Attiya, Welch, Distributed Computing, Wiley-Interscience, 2 ed., 2004
• Garg, Elements of Distributed Computing, Wiley-IEEE, 2002
• Lynch, Distributed Algorithms, Morgan Kaufmann,1997
• Tel, Introduction to Distributed Algorithms, Cambridge U., 2 ed. 2001
![Page 49: Lecture 2 Introduction to Principles of Distributed Computing](https://reader036.fdocuments.net/reader036/viewer/2022062803/56814748550346895db48629/html5/thumbnails/49.jpg)
Sergio Rajsbaum 2006
Bibliographyothers
• Distributed Algorithms and Systems http://www.md.chalmers.se/~tsigas/DISAS/index.html
• Conferences: DISC, PODC,…
• Journals: Distributed Computing,…– Special issue PODC 20th anniversary, Sept. 2003
• ACM SIGACT News Distributed Computing Column. Also one in EATCS Bulletin
![Page 50: Lecture 2 Introduction to Principles of Distributed Computing](https://reader036.fdocuments.net/reader036/viewer/2022062803/56814748550346895db48629/html5/thumbnails/50.jpg)
Sergio Rajsbaum 2006