15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 •...

122
MidTerm-I Review 15-440 Distributed Systems Tuesday Oct 15 th 2019 Midterm: Thursday, Oct 17 th 2019 Time: 10:25am – 11:55am (Please arrive on time) in class, GHC 4401

Transcript of 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 •...

Page 1: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

MidTerm-I Review

15-440 Distributed Systems

Tuesday Oct 15th 2019

Midterm: Thursday, Oct 17th 2019Time: 10:25am – 11:55am (Please arrive on time)in class, GHC 4401

Page 2: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

15-440/15-640 Topics: Midterm 1

• 1) Distribution Systems Intro (Yuvraj) • 2) Communication - Internet in a Day (Yuvraj) • 3) Classical Consistency/Synchronization (Yuvraj) • 4) Time Synchronization (Dave) • 5) Distributed Mutual Exclusion (Dave) • 6) Remote Procedure Calls (Heather => Yuvraj) • 7) Distributed Filesystems – 1 (Yuvraj) • 8) Distributed Filesystems - 2 (Yuvraj) • 9) Logging and Crash Recovery (Yuvraj) • 10) Distributed Concurrency Control (Dave) • 11) Distributed Replication (Dave) • 12) Fault Tolerance & RAID (TAs => Dave) • 13) Distributed Databases Case Study (Dave)

2

Page 3: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Distribution Systems Intro

15-440 Distributed Systems

Page 4: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

What Is A Distributed System?

“A collection of independent computers that appears to its users as a single coherent system.” •Features:

• No shared memory – message-based communication• Each runs its own local OS• Heterogeneity• Expandability

•Ideal: to present a single-system image:• The distributed system “looks like” a single computer

rather than a collection of separate computers.

Page 5: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Definition of a Distributed System

Figure 1-1. A distributed system organized as middleware. The middleware layer runs on all machines, and offers a uniform interface to the system

Page 6: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Distributed Systems: Goals

• Resource Availability: remote access to resources• Distribution Transparency: single system image

• Access, Location, Migration, Replication, Failure,…• Openness: services according to standards (RPC)• Scalability: size, geographic, admin domains, …

• Example of a Distributed System? • Web search on google • DNS: decentralized, scalable, robust to failures, ... • ...

* 6

Page 7: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Communication - Internet in a Day

15-440 Distributed Systems

Page 8: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Packet Switching – Statistical Multiplexing

• Switches arbitrate between inputs• Can send from any input that’s ready

• Links never idle when traffic to send• (Efficiency!)

Packets

8

Page 9: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Model of a communication channel

• Latency - how long does it take for the first bit to reach destination

• Capacity - how many bits/sec can we push through? (often termed “bandwidth”)

• Jitter - how much variation in latency?

• Loss / Reliability - can the channel drop packets?

• Reordering

9

Page 10: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Packet Switching

• Source sends information as self-contained packets that have an address.

• Source may have to break up single message in multiple

• Each packet travels independently to the destination host.• Switches use the address in the packet to determine how to

forward the packets• Store and forward

• Analogy: a letter in surface mail.

10

Page 11: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Internet

• An inter-net: a network of networks.

• Networks are connected using routers that support communication in a hierarchical fashion

• Often need other special devices at the boundaries for security, accounting, ..

• The Internet: the interconnected set of networks of the Internet Service Providers (ISPs)

• About 17,000 different networks make up the Internet

Internet

11

Page 12: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Network Service Model

• What is the service model for inter-network?• Defines what promises that the network gives for any

transmission• Defines what type of failures to expect

• Ethernet/Internet: best-effort – packets can get lost, etc.

12

Page 13: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Possible Failure models

• Fail-stop:• When something goes wrong, the process stops / crashes /

etc.• Fail-slow or fail-stutter:

• Performance may vary on failures as well• Byzantine:

• Anything that can go wrong, will.• Including malicious entities taking over your computers and

making them do whatever they want.• These models are useful for proving things;• The real world typically has a bit of everything.

• Deciding which model to use is important!

13

Page 14: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

IP Layering

• Relatively simple

Bridge/Switch Router/GatewayHost Host

Application

Transport

Network

Link

Physical

14

Page 15: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Protocol Demultiplexing

• Multiple choices at each layer

FTP HTTP TFTPNV

TCP UDP

IP

NET1 NET2 NETn…

TCP/UDPIP

IPX

Port Number

Network

Protocol Field

Type Field

15

Page 16: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

User Datagram Protocol (UDP): An Analogy

Postal Mail• Single mailbox to receive

messages• Unreliable ☺ • Not necessarily in-order

delivery• Each letter is independent• Must address each reply

Example UDP applicationsMultimedia, voice over IP

UDP• Single socket to receive

messages• No guarantee of delivery• Not necessarily in-order

delivery• Datagram – independent

packets• Must address each packet

Postal Mail• Single mailbox to receive

letters• Unreliable ☺• Not necessarily in-order

delivery• Letters sent independently • Must address each letter

16

Page 17: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Transmission Control Protocol (TCP): An Analogy

TCP• Reliable – guarantee

delivery• Byte stream – in-order

delivery• Connection-oriented –

single socket per connection

• Setup connection followed by data transfer

Telephone Call• Guaranteed delivery• In-order delivery• Connection-oriented • Setup connection

followed by conversation

Example TCP applicationsWeb, Email, Telnet

17

Page 18: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Summary: Internet Architecture

• Packet-switched datagram network

• IP is the “compatibility layer” • Hourglass architecture• All hosts and routers run IP

• Stateless architecture• no per flow state inside

network

IP

TCP UDP

ATM

Satellite

Ethernet

18

Page 19: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Summary: Minimalist Approach

• Dumb network• IP provide minimal functionalities to support connectivity

• Addressing, forwarding, routing

• Smart end system• Transport layer or application performs more sophisticated

functionalities• Flow control, error control, congestion control

• Advantages• Accommodate heterogeneous technologies (Ethernet,

modem, satellite, wireless)• Support diverse applications (telnet, ftp, Web, X windows)• Decentralized network administration

19

Page 20: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Classical Consistency/Synchronization

20

15-440 Distributed Systems

Page 21: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Terminology

• Critical Section: piece of code accessing a shared resource, usually variables or data structures

• Race Condition: Multiple threads of execution enter CS at the same time, update shared resource, leading to undesirable outcome

• Indeterminate Program: One or more Race Conditions, output of program depending on ordering, non deterministic

21

Page 22: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Classic synchronization primitives

• Basics of concurrency • Correctness (achieves Mutex, no deadlock, no livelock)• Efficiency, no spinlocks or wasted resources • Fairness

• Synchronization mechanisms • Semaphores (P() and V() operations) • Mutex (binary semaphore) • Condition Variables (allows a thread to sleep)

• Must be accompanied by a mutex • Wait and Signal operations

• Work through examples again22

Page 23: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Time Synchronization

23

15-440 Distributed Systems

Page 24: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Clocks in a Distributed System

• Computer clocks are not generally in perfect agreement• Skew: the difference between the times on two clocks (at any instant)

• Computer clocks are subject to clock drift (they count time at different rates)

• Clock drift rate: the difference per unit of time from some ideal reference clock

• Ordinary quartz clocks drift by about 1 sec in 11-12 days. (10-6 secs/sec).• High precision quartz clocks drift rate is about 10-7 or 10-8 secs/sec

24

Page 25: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Perfect networks

• Messages always arrive, with propagation delay exactly d

• Sender sends time T in a message• Receiver sets clock to T+d

• Synchronization is exact

Page 26: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Cristian’s Time Sync

mr

mt

pTime server,S

• A time server S receives signals from a UTC source• Process p requests time in mr and receives t in mt from S• p sets its clock to t + RTT/2 • Accuracy ± (RTT/2 - min) :

• because the earliest time S puts t in message mt is min after p sent mr. • the latest time was min before mt arrived at p• the time by S’s clock when mt arrives is in the range [t+min, t + RTT - min]

Tround is the round trip time recorded by pmin is an estimated minimum round trip time

26

Page 27: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Berkeley algorithm

• Cristian’s algorithm - • a single time server might fail, so they suggest the use of a group of

synchronized servers• it does not deal with faulty servers

• Berkeley algorithm (also 1989)• An algorithm for internal synchronization of a group of computers• A master polls to collect clock values from the others (slaves)• The master uses round trip times to estimate the slaves’ clock values• It takes an average (eliminating any above average round trip time or with

faulty clocks)• It sends the required adjustment to the slaves (better than sending the

time which depends on the round trip time)• Measurements

• 15 computers, clock synchronization 20-25 millisecs drift rate < 2x10-5

• If master fails, can elect a new master to take over (not in bounded time)

•27

Page 28: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

NTP Protocol

• All modes use UDP• Each message bears timestamps of recent events:

• Local times of Send and Receive of previous message• Local times of Send of current message

• Recipient notes the time of receipt T3 (we have T0, T1, T2, T3)

28

T3

T2T1

T0

Server

Client

Time

m m'

Time

Page 29: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Logical time and logical clocks (Lamport 1978)

• Instead of synchronizing clocks, use event ordering

1. If two events occurred at the same process pi (i = 1, 2, … N) then they occurred in the order observed by pi, that is the definition of: “→ i”

2. when a message, m is sent between two processes, send(m) happens before receive(m)

3. The happened before relation is transitive

• The happened before relation is the relation of causal ordering

29

Page 30: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Total-order Lamport clocks

• Many systems require a total-ordering of events, not a partial-ordering

• Use Lamport’s algorithm, but break ties using the process ID• L(e) = M * Li(e) + i

• M = maximum number of processes• i = process ID

Page 31: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Vector Clocks

• Note that e → e’ implies V(e)<V(e’). The converse is also true

• Can you see a pair of parallel events?• c || e (parallel) because neither V(c) <= V(e) nor V(e) <= V(c)

31

Page 32: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Clock Sync Important Lessons

• Clocks on different systems will always behave differently• Skew and drift between clocks

• Time disagreement between machines can result in undesirable behavior

• Two paths to solution: synchronize clocks or ensure consistent clocks

32

Page 33: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Clock Sync Important Lessons

• Clock synchronization• Rely on a time-stamped network messages• Estimate delay for message transmission• Can synchronize to UTC or to local source• Clocks never exactly synchronized• Often inadequate for distributed systems• might need totally-ordered events• might need millionth-of-a-second precision

• Logical Clocks• Encode causality relationship• Lamport clocks provide only one-way encoding• Vector clocks provide exact causality information

Page 34: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Remote Procedure Calls

15-440 Distributed Systems

34

Page 35: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Passing Value Parameters (1)

• The steps involved in a doing a remote computation through RPC.

35

Page 36: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Stubs: obtaining transparency

• Compiler generates from API stubs for a procedure on the client and server

• Client stub • Marshals arguments into machine-independent format• Sends request to server• Waits for response• Unmarshals result and returns to caller

• Server stub• Unmarshals arguments and builds stack frame• Calls procedure• Server stub marshals results and sends reply

36

Page 37: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Real solution: break transparency

• Possible semantics for RPC:• Exactly-once

• Impossible in practice• At least once:

• Only for idempotent operations• At most once

• Zero, don’t know, or once• Zero or once

• Transactional semantics

37

Page 38: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Asynchronous RPC (3)

• A client and server interacting through two asynchronous RPCs.

38

Page 39: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Important Lessons

• Procedure calls• Simple way to pass control and data• Elegant transparent way to distribute application• Not only way…

• Hard to provide true transparency• Failures• Performance• Memory access• Etc.

39

Page 40: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Distributed File Systems

40

15-440 Distributed Systems

Page 41: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Why DFSs?

• Why Distributed File Systems: • Data sharing among multiple users• User mobility• Location transparency• Backups and centralized management• Examples: NFS, AFS, CODA, LBFS

• Idea: Provide file system interfaces to remote FS’s• Challenge: heterogeneity, scale, security, concurrency,..• Non-Challenges: AFS meant for campus community• Virtual File Systems: pluggable file systems • Use RPC’s

41

Page 42: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

DFS Important bits (1)

• Distributed filesystems almost always involve a tradeoff: consistency, performance, scalability.

• We’ve learned a lot since NFS and AFS (and can implement faster, etc.), but the general lesson holds. Especially in the wide-area.

• We’ll see a related tradeoff, also involving consistency, in a while: the CAP tradeoff. Consistency, Availability, Partition-resilience.

Page 43: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

VFS Interception

43

Page 44: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

NFS’s Failure Handling – Stateless Server

• Files are state, but...• Server exports files without creating extra state

• No list of “who has this file open” (permission check on each operation on open file!)

• No “pending transactions” across crash• Crash recovery is “fast”

• Reboot, let clients figure out what happened• State stashed elsewhere

• Separate MOUNT protocol• Separate NLM locking protocol in NFSv4

• Stateless protocol: requests specify exact state. read() → read( [position]). no seek on server.

Page 45: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

NFS’s Failure Handling

• Operations are idempotent• How can we ensure this? Unique IDs on files/directories.

It’s not delete(“foo”), it’s delete(1337f00f), where that ID won’t be reused.

• Not perfect → e.g., mkdir• Write-through caching: When file is closed, all

modified blocks sent to server. close() does not return until bytes safely stored.

• Close failures? • retry until things get through to the server• return failure to client

• Most client apps can’t handle failure of close() call. • Usual option: hang for a long time trying to contact server

Page 46: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

AFS Cell/Volume Architecture

• Cells correspond to administrative groups• /afs/andrew.cmu.edu is a cell

• Cells are broken into volumes (miniature file systems)

• One user's files, project source tree, ...• Typically stored on one server• Unit of disk quota administration, backup

• Client machine has cell-server database• protection server handles authentication• volume location server maps volumes to servers

46

Page 47: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Client Caching in AFS

• Callbacks! Clients register with server that they have a copy of file;• Server tells them: “Invalidate!” if the file changes• This trades state for improved consistency

• What if server crashes? Lose all callback state!• Reconstruct callback information from client: go ask

everyone “who has which files cached?”• What if client crashes?

• Must revalidate any cached content it uses since it may have missed callback

Page 48: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

AFS Write Policy

• Writeback cache• Opposite of NFS “every write is sacred”• Store chunk back to server

• When cache overflows• On last user close()

• ...or don't (if client machine crashes)• Is writeback crazy?

• Write conflicts “assumed rare”• Who wants to see a half-written file?

48

Page 49: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

DFS: Name-Space Construction and Organization

• NFS: per-client linkage• Server: export /root/fs1/• Client: mount server:/root/fs1 /fs1

• AFS: global name space• Name space is organized into Volumes

• Global directory /afs; • /afs/cs.wisc.edu/vol1/…; /afs/cs.stanford.edu/vol1/…

• Each file is identified as fid = <vol_id, vnode #, unique identifier>

• All AFS servers keep a copy of “volume location database”, which is a table of vol_id→ server_ip mappings

49

Page 50: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Coda Summary

• Distributed File System built for mobility• Disconnected operation key idea

• Puts scalability and availability beforedata consistency• Unlike NFS

• Assumes that inconsistent updates are very infrequent

• Introduced disconnected operation mode and file hoarding and the idea of “reintegration”

50

Page 51: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Coda States

1. Hoarding:Normal operation mode

2. Emulating:Disconnected operation mode

3. Reintegrating:Propagates changes and detects inconsistencies

Hoarding

Emulating Recovering

51

Page 52: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Low Bandwidth File SystemKey Ideas

• A network file systems for slow or wide-area networks

• Exploits similarities between files • Avoids sending data that can be found in the server’s

file system or the client’s cache• Uses RABIN fingerprints on file content (file chunks)

• Can deal with byte offsets when part of file change

• Also uses conventional compression and caching• Requires 90% less bandwidth than traditional

network file systems

52

Page 53: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

LBFS chunking solution

• Considers only non-overlapping chunks• Sets chunk boundaries based on file contents

rather than on position within a file• Examines every overlapping 48-byte region of file

to select the boundary regions called breakpoints using Rabin fingerprints• When low-order 13 bits of region’s fingerprint equals a

chosen value, the region constitutes a breakpoint

53

Page 54: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Effects of edits on file chunks

• Chunks of file before/after edits• Grey shading show edits

• Stripes show regions with magic values that create chunk boundaries

54

Page 55: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Distributed Mutual Exclusion

15-440 Distributed Systems

Page 56: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

What is “Scalability”?

56

Challenges when scaling out?

Page 57: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Motivation: Need for Distributed Mutex

• San Fran customer adds $100, NY bank adds 1% interest• San Fran will have $1,111 and NY will have $1,110

• Updating a replicated database and leaving it in an inconsistent state

57

(San Francisco) (New York)

(+$100) (+1%)

Page 58: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Comparison of 5 Mutex Algorithms

58

• Which one would you choose?• What happens with crashes?

Algorithm # Messages per cycle

Delay before entry

Problems

Centralized 3 2 Coordinator crash

Decentralized 2 m k + m, k≥1 2m Starvation

Lamport 3 (N-1) 2 (N-1) Crash of any process, inefficient

Ricart & Agrawala 2 (N-1) 2 (N-1) Crash of any process

Token ring 1 to infinite 0 to (N-1) Lost token, process crash

Page 59: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

A Centralized Algorithm (1)

@ Client → Acquire: Send (Request, i) to coordinator Wait for reply

@ Server:while true: m = Receive() If m == (Request, i):If Available():

Send (Grant) to i

59

Page 60: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Distributed Algorithm (Strawman)

• Assume that there are n coordinators• Access requires a majority vote from m > n/2

coordinators. • A coordinator always responds immediately to a

request with GRANT or DENY• Node failures are still a problem• Large numbers of nodes requesting access can

affect availability

60

Page 61: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Totally-Ordered Multicast

61

• A multicast operation by which all messages are delivered in the same order to each receiver.

• Distributed data structure (priority queue)• Queue messages until they’re ACKed• Uses TO-Lamport Clocks:

• Each message is timestamped with the current logical time of its sender.

• Multicast messages are also sent back to the sender.• Assume all messages sent by one sender are

received in the order they were sent and that no messages are lost.

Book pp. 313

Page 62: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Totally-Ordered Multicast

62

• Multicast messages + local timestamp-ordered queue• Multicasts an ACK to all other processes• Process only if *both* at queue head and ACK’ed

Page 63: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Lamport Mutual Exclusion

• Every process maintains a queue of pending requests for entering critical section in order. The queues are ordered by virtual time stamps derived from Lamport timestamps

• For any events e, e' such that e --> e' (causality ordering), T(e) < T(e')

• For any distinct events e, e', T(e) != T(e')

• When node i wants to enter C.S., it sends time-stamped request to all other nodes (including itself)

• Wait for replies from all other nodes.• If own request is at the head of its queue and all replies have been

received, enter C.S.• Upon exiting C.S., remove its request from the queue and send a

release message to every process.

63

Page 64: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Ricart & Agrawala Algorithm

• Also relies on Lamport totally ordered clocks.

• When node i wants to enter C.S., it sends time-stamped request to all other nodes. These other nodes reply (eventually). When i receives n-1 replies, then can enter C.S.

• Trick: Node j having earlier request doesn't reply to i until after it has completed its C.S.

64

Page 65: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

A Token Ring Algorithm

• Organize the processes involved into a logical ring• One token at any time → passed from node to

node along ring

65

Page 66: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

A Token Ring Algorithm

• Correctness:• Clearly safe: Only one process can hold token

• Fairness: • Will pass around ring at most once before getting

access.• Performance:

• Each cycle requires between 1 - ∞ messages• Latency of protocol between 0 & n-1

• Issues• Lost token

66

Page 67: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Summary

• Lamport algorithm demonstrates how distributed processes can maintain consistent replicas of a data structure (the priority queue).

• Ricart & Agrawala's algorithms demonstrate utility of logical clocks.

• Centralized & ring based algorithms much lower message counts

• None of these algorithms can tolerate failed processes or dropped messages.

67

Page 68: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Distributed Concurrency Management

15-440 Distributed Systems

Page 69: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Distributed Concurrency Management

• Single Server: Transactions (RD/WR to Global State)• ACID: Atomicity, Consistency, Isolation, Durability

• E.g. banking app => ACID is violated if not careful• Solutions: 2-phase locking (General, strict, strong strict)

• Deadling with deadlocks => build “waits-for” graph• Transactions: 2 phases (prep, commit/abort)

• Preparation: generate Lock Set “L”, Updates “U” • COMMIT (updated global state), ABORT (leave state as is) • Example using banking app

* 69

Page 70: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Transactions – split into 2 phases

• Phase 1: Preparation: • Determine what has to be done, how it will change state,

without actually altering it.• Generate Lock set “L” • Generate List of Updates “U”

• Phase 2: Commit or Abort • Everything OK, then update global state • Transaction cannot be completed, leave global state as is• In either case, RELEASE ALL LOCKS

70

Page 71: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Distributed Transactions – 2PC

• Similar idea as before, but: • State spread across servers (maybe even WAN) • Want to enable single transactions to read and update

global state while maintaining ACID properties • Overall Idea:

• Client initiate transaction. Makes use of “co-ordinator”• All other relevant servers operate as “participants” • Co-ordinator assigns unique transaction ID (TID)

71

Page 72: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Implementing 2-Phase Commit

• Implemented as a set of messages

72

Page 73: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Implementing 2-Phase Commit

• Implemented as a set of messages

• Messages in first phase • A: Coordinator sends “CanCommit?”

to participants

73

Page 74: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Implementing 2-Phase Commit

• Implemented as a set of messages

• Messages in first phase • A: Coordinator sends “CanCommit?”

to participants• B: Participants respond:

“VoteCommit” or “VoteAbort”

74

Page 75: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Implementing 2-Phase Commit

• Implemented as a set of messages

• Messages in first phase • A: Coordinator sends “CanCommit?”

to participants• B: Participants respond:

“VoteCommit” or “VoteAbort”

75

• Messages in the second phase • A: All “VoteCommit”: , Coord sends “DoCommit”• If any “VoteAbort”: abort transaction. Coordinator

sends “DoAbort” to everyone => release locks

Page 76: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Implementing 2-Phase Commit

• Implemented as a set of messages

• Messages in first phase • A: Coordinator sends “CanCommit?”

to participants• B: Participants respond:

“VoteCommit” or “VoteAbort”

76

• Messages in the second phase • A: All “VotedCommit”: , Coord sends “DoCommit”• If any “VoteAbort”: abort transaction. Coordinator

sends “DoAbort” to everyone => release locks

Page 77: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Deadlocks and Livelocks

• Distributed deadlock • Cyclic dependency of locks by transactions across

servers • In 2PC this can happen if participants unable to

respond to voting request (e.g. still waiting on a lock on its local resource)

• Handled with a timeout. Participants times out, then votes to abort. Retry transaction again.

• Addresses the deadlock concern • However, danger of LIVELOCK – keep trying!

77

Page 78: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Summary: Distributed Concurrency

• Distributed consistency management• ACID Properties desirable • Single Server case: use locks, and in cases use

2-phase locking (strict 2PL, strong strict 2PL), transactional support for locks

• Multiple server distributed case: use 2-phase commit for distributed transactions. Need a coordinator to manage messages from partipants.

78

Page 79: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Fault Tolerance, Logging and Recovery

15-440 Distributed Systems

Page 80: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Summary – Fault Tolerance

• Real Systems (are often unreliable) • Introduced basic concepts for Fault Tolerant Systems

including redundancy, process resilience, RPC

• Fault Tolerance – Backward recovery using checkpointing, both Independent and coordinated

• Fault Tolerance –Recovery using Write-Ahead-Logging

80

Page 81: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Dependability Concepts

• Availability – the system is ready to be used immediately.

• Reliability – the system runs continuously without failure.

• Safety – if a system fails, nothing catastrophic will happen. (e.g. process control systems)

• Maintainability – when a system fails, it can be repaired easily and quickly (sometimes, without its users noticing the failure)

Page 82: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Masking Failures by Redundancy

• Strategy: hide the occurrence of failure from other processes using redundancy.

1. Information Redundancy – add extra bits to allow for error detection/recovery (e.g., Hamming codes and the like).

2. Time Redundancy – perform operation and, if needs be, perform it again. Think about how transactions work (BEGIN/END/COMMIT/ABORT).

3. Physical Redundancy – add extra (duplicate) hardware and/or software to the system.

Page 83: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Recovery Strategies

• When a failure occurs, we need to bring the system into an error free state (recovery). This is fundamental to Fault Tolerance.

1. Backward Recovery: return the system to some previous correct state (using checkpoints), then continue executing.

-- Can be expensive, however still used 2. Forward Recovery: bring the system into a

correct new state, from which it can then continue to execute.

-- Need to know potential errors up front!

Page 84: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Independent Checkpointing

Recovery line: correct distributed snapshotThis becomes challenging if checkpoints are un-coordinated

Page 85: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Coordinated Checkpointing

• Key idea: each process takes a checkpoint after a globally coordinated action. (why is this good?)

• Simple Solution: 2-phase blocking protocol• Co-ordinator multicast checkpoint_REQUEST message • Participants receive message, takes a checkpoint, stops sending

(application) messages, and sends back checkpoint_ACK• Once all participants ACK, coordinator sends checkpoint_DONE to

allow blocked processes to go on

• Optimization: consider only processes that depend on the recovery of the coordinator (those it sent a message since last checkpoint)

85

Page 86: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

• Write-Ahead-Logging

• Provide Atomicity and Durability• Idea: create a log recording every update to database • Updates considered reliable when stored on disk• Updated versions are kept in memory (page cache) • Logs typically store both REDO and UNDO operations• After a crash, recover by replaying log entries to reconstruct

correct state • 3 Passes: (Analysis Pass, recovery pass, Undo Pass) • WAL is common, fewer disk operations, transactions

considered committed once log written.

86

Page 87: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Recovery using WAL – 3 passes

• Analysis Pass • Reconstruct TT and DPT (from start or last checkpoint)• Get copies of all pages at the start

• Recovery Pass (redo pass) • Replay log forward, make updates to all dirty pages• Bring everything to a state at the time of the crash

• Undo Pass • Replay log file backward, revert any changes made by

transactions that had not committed (use PrevLSN)• For each write Compensation Log Record (CLR)• Once you reach BEGIN TXN, write an END TXN entry

87

Page 88: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Optimizing WAL

• As described earlier: • Replay operations back to the beginning of time • Log file would be kept forever, (entire Database)

• In practice, we can do better with CHECKPOINT• Periodically save DPT, TT • Store any dirty pages to disk, indicate in LOG file • Prune initial portion of log file: All transactions upto

checkpoint have been committed or aborted.

88

Page 89: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Distributed Replication

15-440 Distributed Systems

Page 90: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Distributed Consistency Concepts

• Requires write replication, and some degree of consistency• Strict Consistency

• Read always returns value from latest write• Sequential Consistency

• All nodes see operations in some sequential order• Operations of each process appear in-order in this sequence

• Causal Consistency• All nodes see causally related writes in same order• But concurrent writes may be seen in different order on different

machines• Eventual Consistency

• All nodes will learn eventually about all writes, in the absence of updates

90

Page 91: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Sequential Consistency (1)

• Behavior of two processes operating on the same data item. The horizontal axis is time.

• P1: Writes “W” value a to variable “x”• P2: Reads `NIL’ from “x” first and then `a’

Page 92: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Sequential Consistency (2)

• A data store is sequentially consistent when:• The result of any execution is the same as if the

(read and write) operations by all processes on the data store …• Were executed in some sequential order and …• the operations of each individual process appear

…▪ in this sequence ▪ in the order specified by its program.

Page 93: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Sequential Consistency (3)

(b) A data store that is not sequentially consistent.

(a) A sequentially consistent data store.

Page 94: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Causal Consistency (1)

• For a data store to be considered causally consistent, it is necessary that the store obeys the following condition:

• Writes that are potentially causally related …• must be seen by all processes• in the same order.

• Concurrent writes …• may be seen in a different order • on different machines.

Page 95: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Causal Consistency (2)

• Figure 7-8. This sequence is allowed with a causally-consistent store, but not with a sequentially consistent store.

Page 96: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Replicate: State versus Operations

Possibilities for what is to be propagated:•Propagate only a notification of an update.- Sort of an “invalidation” protocol

•Transfer data from one copy to another.- Read-to-Write ratio high, can propagate logs (save bandwidth)

•Propagate the update operation to other copies- Don’t transfer data modifications, only operations – “Active replication”

Page 97: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Remote-Write PB Protocol

Updates are blocking, although non-blocking possible

Page 98: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Replication: Quorum based consensus

• Quorum consensus

• Designed to have fast response time even under failures

• Replicas are “active” - participate in protocol; there is no master, per se.

• Good: Clients don’t even see the failures. Bad: More complex.

Page 99: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

• Correctness (safety):•All nodes agree on the same value

•The agreed value X has been proposed by some node

• Fault-tolerance:• If less than N/2 nodes fail, the rest should reach agreement eventually w.h.p

•Liveness is not guaranteed

• Termination (not guaranteed)

PAXOS: Requirement

Page 100: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Fischer-Lynch-Paterson [FLP’85] impossibility result

• It is impossible for a set of processors in an asynchronous system to agree on a binary value, even if only a single processor is subject to an unannounced failure.

• Synchrony --> bounded amount of time node can take to process and respond to a requestAsynchrony --> timeout is not perfect

Page 101: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Single Decree Paxos: Protocol

Acceptors

3)Respond to Prepare(n):• If n > minProposal then minProposal = n Prepare-OK(acceptedProposal, acceptedValue) else Prepare-REJECT()

6)Respond to Accept(n, value):• If n ≥ minProposal

acceptedProposal = minProposal = nacceptedValue = value

Accept-OK() else Accept-REJECT()

Acceptors must record minProposal, acceptedProposal, and acceptedValue on stable storage (disk)

Proposers1)Choose new proposal number n, value v2)Broadcast Prepare(n) to all servers

4)When responses received from majority:• If any acceptedValues returned v = acceptedValue of highest acceptedProposal

5)Broadcast Accept(n, value) to all servers

6)When Accept-OK from majority Value is chosen (Commit)Else Restart: goto 1, with larger number n

101

Page 102: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Some Remarks

• Only proposer knows chosen value (majority accepted)• Only a single value is chosen → MultiPaxos• No guarantee that proposer’s original value v is chosen by

itself• Number n is basically a Lamport clock → always unique n• Key invariant:

• If a proposal with value `v' is chosen, all higher proposals must have value `v’

• Dueling proposer• Resolved using number n in prepare

• There are challenging corner cases

102

Page 103: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Fault Tolerance and RAID

15-440 Distributed Systems

Page 104: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Outline

• Errors/error recovery

• Using multiple disks• Why have multiple disks?• problem and approaches

• RAID levels and performance

• Estimating availability

104

Page 105: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Parity Checking

Single Bit Parity:Detect single bit errors

105

Page 106: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Error Recovery – Error Correcting Codes (ECC)

Two Dimensional Bit Parity:Detect and correct single bit errors

0 0

106

Page 107: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Error Detection – CRC

• View data bits, D, as a binary number• Choose r+1 bit pattern (generator), G • Goal: choose r CRC bits, R, such that

• <D,R> exactly divisible by G (modulo 2) • Receiver knows G, divides <D,R> by G. If non-zero remainder:

error detected!• Can detect all burst errors less than r+1 bits

• Widely used in practice: Ethernet, disks

107

Page 108: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Disk Striping

• Interleave data across multiple disks • Large file streaming can enjoy parallel transfers • Small requests benefit from load balancing

• If blocks of hot files equally likely on all disks (really?)

108

Page 109: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Redundancy via replicas

• Two (or more) copies• mirroring, shadowing, duplexing, etc.

• Write both, read either

109

Page 110: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

A Better Approach?: Parity Disk

• Capacity: one extra disk needed per stripe

• Disk failures are self-identifying (a.k.a. erasures)

• Don’t have to find the error

110

Page 111: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Updating and using the parity

111

Page 112: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Better: Striping the Parity

• Removes parity disk bottleneck

112

Page 113: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Performance

113

Page 114: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Measuring Availability

• Mean time to failure (MTTF) - “uptime”• Mean time to repair (MTTR)• Mean time between failures (MTBF)• MTBF = MTTF + MTTR

• Availability = MTTF / (MTTF + MTTR)• Suppose OS crashes once per month,

takes 10min to reboot. • MTTF = 720 hours = 43,200 minutes

MTTR = 10 minutes• Availability = 43200 / 43210 = 0.997 (~“3 nines”)

114

Page 115: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Disk failure conditional probability distribution - Bathtub curve

Expected operating lifetime

1 / (reported MTTF)

Infantmortality

Burn out

115

Page 116: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Reliability without rebuild

• 200 data drives with MTTFdrive• MTTDLarray = MTTFdrive / 200

• Add 200 drives and do mirroring • MTTFpair = (MTTFdrive / 2) + MTTFdrive = 1.5 * MTTFdrive • MTTDLarray = MTTFpair / 200 = MTTFdrive / 133

• Add 50 drives, each with parity across 4 data disks • MTTFset = (MTTFdrive / 5) + (MTTFdrive / 4) = 0.45 * MTTFdrive • MTTDLarray = MTTFset / 50 = MTTFdrive / 111

• These are approximations

116

Page 117: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Distributed Databases Case Study

15-440 Distributed Systems

Page 118: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Consistency Definitions

118

Sequential Consistency• All nodes see operations in some sequential

order• Operations of each process appear in-order in

this sequenceEventual Consistency• All nodes will learn eventually about all writes, in

the absence of updates

External Consistency• If T1 commits before T2, then the commit order

must be T1 before T2

Page 119: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Consistent Distributed Database

119

Two nodes:… Sequential

Consistency?Hash-based data partitioning (sharding)

Shard x

Shard y

Page 120: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Summary So Far: When to Use What?

120

Use Case Problems

Distributed Mutex Distributed KV without transactions

Failures + Slow

2PC Distributed DB with transactions(e.g., Spanner)

Failures

Primary-Backup Cost-efficient fault tolerance (e.g., FaRM, GFS, VMWare-FT)

Correlated failures

Paxos Staying up no matter the cost (e.g., Spanner, FaunaDB)

Delay and huge cost overhead

RAID, Checksums Every system Node failures

Page 121: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

Practical Constraints: Alternative I

121

2005-2012: NoSQL systems

Design choices: AP: availability over consistency“infinitely” scalable

write S = 1 S=1

ok, doneS=1

Network partition

read S

S?write X = 9

ok, doneChallenge: version reconcilation (parallel writes..)

Practical approach (Dynamo): Vector Clocks

X=9

Only eventually consistent!

Page 122: 15-440 Distributed Systems - Synergy Labs · 2019. 10. 15. · 15-440/15-640 Topics: Midterm 1 • 1) Distribution Systems Intro (Yuvraj) • 2) Communication -Internet in a Day ...

2012-2018: resurgence of consistent distributed DBs

Three key reasons [→ Daniel Abadi, UMD]1. application code gets too complex and buggy without

consistency support in DB2. better network availability, CP (from CAP) choice is less

relevant, availability sacrifice hardly noticeable3. CAP asymmetry: CP can guarantee consistency, AP can’t

guarantee availability (only question of degree)

Practical Constraints: Alternative II

Most workloads are read heavy. New systems support lock-free consistent reads.

Even stronger consistency requirements.

These guarantee at least sequential consistency, unlike NoSQL.