EECE 411: Design of Distributed Software Applications Lecture 6 [Last time] Distributed object...

39
EECE 411: Design of Distributed Software Applications Lecture 6 [Last time] Distributed object systems Java RMI Assignment 2 Garbage collection Data distribution
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    223
  • download

    0

Transcript of EECE 411: Design of Distributed Software Applications Lecture 6 [Last time] Distributed object...

Page 1: EECE 411: Design of Distributed Software Applications Lecture 6 [Last time] Distributed object systems Java RMI Assignment 2 Garbage collection Data distribution.

EECE 411: Design of Distributed Software Applications

Lecture 6 [Last time] Distributed object systems

Java RMI Assignment 2 Garbage collection

Data distribution

Page 2: EECE 411: Design of Distributed Software Applications Lecture 6 [Last time] Distributed object systems Java RMI Assignment 2 Garbage collection Data distribution.

EECE 411: Design of Distributed Software Applications

Summary for last time

Push vs. pull design Distributed garbage collection

Solutions much more complex than for non-distributed case

No perfect solution: depending on the assumptions you make on your platform one or the other might offer the best tradeoffs

Lease based approaches (or soft-state): often practical and scalable in distributed environments

Page 3: EECE 411: Design of Distributed Software Applications Lecture 6 [Last time] Distributed object systems Java RMI Assignment 2 Garbage collection Data distribution.

EECE 411: Design of Distributed Software Applications

Assignment 2 discussion

Push vs. pull design

Server initiates communication (pushes data) Advantage: possibly lower load on server Drawback: server needs to maintain state (list of

clients) Client initiates communication (pulls data)

Advantage: no client registration needed, server does not maintain data, more flexibility for clients

Drawback: load on server, DoS attacks

Page 4: EECE 411: Design of Distributed Software Applications Lecture 6 [Last time] Distributed object systems Java RMI Assignment 2 Garbage collection Data distribution.

EECE 411: Design of Distributed Software Applications

Assignment 2 discussion

Server initiates communication (pushes data) Two subsequent problems:

When to initiate communication (When to push the data)?

Where/How to push it (How to find the clients?)

Page 5: EECE 411: Design of Distributed Software Applications Lecture 6 [Last time] Distributed object systems Java RMI Assignment 2 Garbage collection Data distribution.

EECE 411: Design of Distributed Software Applications

Assignment 2 discussion: Chat system using RMI & callbacks

A possible implementation : the server has

a Multicaster object with a method send(String) each client has

a Display object with a method show(String)

both methods are remote.

Clients invoke send and the server invokes show.

Sending a string means showing it on all displays.

Page 6: EECE 411: Design of Distributed Software Applications Lecture 6 [Last time] Distributed object systems Java RMI Assignment 2 Garbage collection Data distribution.

EECE 411: Design of Distributed Software Applications

/* a synchronized queue */public class MessageQueue {

/* the actual queue */ private LinkedList _queue ;

/* the constructor - it simply creates the LinkedList to store queue elements*/

public MessageQueue() { _queue = new LinkedList(); }

/* gets the first element of the queue or blocks if the queue is empty*/

public synchronized String dequeue() throws InterruptedException {

while (_queue.isEmpty()) { wait(); } return (String)_queue.removeFirst(); }

/* add a new element to the queue */ public synchronized void enqueue(String m) { _queue.addLast(m); notify(); }}

Page 7: EECE 411: Design of Distributed Software Applications Lecture 6 [Last time] Distributed object systems Java RMI Assignment 2 Garbage collection Data distribution.

EECE 411: Design of Distributed Software Applications

public class Main { static GUI gui; static MessageQueue _queue;

public static void main(String[] args) { // create a shared buffer where the GUI adds the messages that need to // be sent out by the main thread. The main thread stays in a loop and

// when a new message shows up in the buffer it sends it out to the server _queue = new MessageQueue();

// instantiate the GUI - in a new thread javax.swing.SwingUtilities.invokeLater(new Runnable() { public void run() { gui = GUI.createAndShowGUI(_queue); } });

// hack: make sure the GUI instantioation is completed by the GUI thread // before the next call while (gui == null) Thread.currentThread().yield(); // calling the GUI method that updates the text area of the GUI // you might want to call the same method when a new chat message arrives gui.addToTextArea("RemoteUser:> Sample of displaying remote maessage"); /// … cont next page

// The code below serves as an example to show how to shares message // between the GUI and the main thread. // You will probably want to replace the code below with code that sits in a loop, // waits for new messages to be entered by the user, and sends them to the // chat server (using an RMI call) // // In addition you may want to add code that // * connects to the chat server and provides an object for callbacks (so // that the server has a way to send messages generated by other users) // * implement the callback object which is called by the server remotely // and, in turn, updates the local GUI while (true) { String s; try { // wait until the user enters a new chat message s = _queue.dequeue(); } catch (InterruptedException ie) { break; } // update the GUI with the message entered by the user gui.addToTextArea("Me:> " + s); // print it to System.out (or send it to the RMI server) System.out.println ("User entered: " + s + " -- now sending it to chat server"); } // end while loop } }

Page 8: EECE 411: Design of Distributed Software Applications Lecture 6 [Last time] Distributed object systems Java RMI Assignment 2 Garbage collection Data distribution.

EECE 411: Design of Distributed Software Applications

public static void main(String[] args) {…… CONTUNIED ….

// example to show how to share message between the GUI and the main thread.

// You will probably want to replace the code below with code that sits in a loop,

// waits for new messages to be entered by the user, and sends them to the // chat server

// In addition you may want to add code that: // * connects to the chat server and provides an object for callbacks (so // that the server has a way to send messages generated by other users) // * implement the callback object which is called by the server remotely // and, in turn, updates the local GUI

while (true) { String s; try { // wait until the user enters a new chat message s = _queue.dequeue(); } catch (InterruptedException ie) { break; } // update the GUI with the message entered by the user gui.addToTextArea("Me:> " + s); } // end while loop }}

Page 9: EECE 411: Design of Distributed Software Applications Lecture 6 [Last time] Distributed object systems Java RMI Assignment 2 Garbage collection Data distribution.

EECE 411: Design of Distributed Software Applications

Design exercise

Imagine a two-level p2p network (e.g., Skype) Each normal peer registers with one super-peer Super-peers provide additional functionality:

directory search, call routing, etc. There are some central servers (e.g., that support

the www.skype.com domain, register new users, etc).

Skype would like to present on its webpage and estimate of for the number of participating nodes.

Design a protocol.

Page 10: EECE 411: Design of Distributed Software Applications Lecture 6 [Last time] Distributed object systems Java RMI Assignment 2 Garbage collection Data distribution.

EECE 411: Design of Distributed Software Applications

Soft-state

Producer sends state to receiver(s) over a (lossy) channel. Receivers keep state and associated timeouts.

Advantages: Decuples state producer and consumer: no explicit

failure detection and state removal messages ‘Eventual’ state

Works well in practice: RSVP, RIP, tons of other systems.

State producer

State consumer

Page 11: EECE 411: Design of Distributed Software Applications Lecture 6 [Last time] Distributed object systems Java RMI Assignment 2 Garbage collection Data distribution.

EECE 411: Design of Distributed Software Applications

Garbage collection in single box systems

Solutions Reference counting Tracing based solutions (mark and sweep)

Page 12: EECE 411: Design of Distributed Software Applications Lecture 6 [Last time] Distributed object systems Java RMI Assignment 2 Garbage collection Data distribution.

EECE 411: Design of Distributed Software Applications

Garbage collection in distributed systems Why is it different?

References distributed across multiple address spaces

Why a solution may be hard to design: Unreliable communication Unannounced failures Overheads

Page 13: EECE 411: Design of Distributed Software Applications Lecture 6 [Last time] Distributed object systems Java RMI Assignment 2 Garbage collection Data distribution.

EECE 411: Design of Distributed Software Applications

Reference Counting

The problem: maintaining a proper reference count in the presence of unreliable communication.

Key: ability to detect duplicate messages [A note on terminology: for the next few slides I’ll

use proxy for client stub and skeleton for server stub.]

Page 14: EECE 411: Design of Distributed Software Applications Lecture 6 [Last time] Distributed object systems Java RMI Assignment 2 Garbage collection Data distribution.

EECE 411: Design of Distributed Software Applications

Reference Counting (cont)Passing remote object references

a) Copy the reference and let the destination increment the counter• Problems?

• What if P1 deletes its reference before P2 increments the counter

b) Signal the copy first to the server• Problems?

• Overheads, Coupling (what if P2 fails?)

Page 15: EECE 411: Design of Distributed Software Applications Lecture 6 [Last time] Distributed object systems Java RMI Assignment 2 Garbage collection Data distribution.

EECE 411: Design of Distributed Software Applications

Advanced Solutions Weighted Reference Counting

a) Initial assignment of weights (lifes)b) New weight (life) assignment when creating a new reference.

Page 16: EECE 411: Design of Distributed Software Applications Lecture 6 [Last time] Distributed object systems Java RMI Assignment 2 Garbage collection Data distribution.

EECE 411: Design of Distributed Software Applications

Advanced Solutions: Weighted Reference Counting (II)

Weight (life) assignment when copying a reference.

Pros/cons? + Create new references without contacting the

server! - Client machine failures

Page 17: EECE 411: Design of Distributed Software Applications Lecture 6 [Last time] Distributed object systems Java RMI Assignment 2 Garbage collection Data distribution.

EECE 411: Design of Distributed Software Applications

Reference Listing (Java RMI’s solution)

Skeleton maintains a list of client proxies Creating a remote reference

Assume P attempts to create remote reference to O P sends its identification to O skeleton O acknowledges and stores P identity P creates the proxy

Copying a remote reference (P1 attempts to pass to P2 a remote reference to O)

Advantages: add/delete are idempotent

i.e. duplicate operations have no effect no reliable communication required

Drawback overheads/scalability – the list of proxies can grow large handling unanounced client failures (may lead to

resource leak)

Page 18: EECE 411: Design of Distributed Software Applications Lecture 6 [Last time] Distributed object systems Java RMI Assignment 2 Garbage collection Data distribution.

EECE 411: Design of Distributed Software Applications

Reference Listing (Java RMI’s solution) Handling failures

Handling failures Lease based approach:

Skeleton promises to keep info on client only for limited time.

If info not renewed then the skeleton discards it.

Pros/Cons?

Page 19: EECE 411: Design of Distributed Software Applications Lecture 6 [Last time] Distributed object systems Java RMI Assignment 2 Garbage collection Data distribution.

EECE 411: Design of Distributed Software Applications

Distributed system: collection of independent components that appears to its users as a single coherent system

Components need to communicate Shared memory Message exchange

So far we talked about point-to-point, (generally synchronous, non-persistent) communication

Socket programming: Message based, generally synchronous, non-persistent

Client-server infrastructures RPC, RMI

Data distribution: Multicast Epidemic algorithms

Roadmap

Page 20: EECE 411: Design of Distributed Software Applications Lecture 6 [Last time] Distributed object systems Java RMI Assignment 2 Garbage collection Data distribution.

EECE 411: Design of Distributed Software Applications

Multicast Communication

Calgary

ChicagoMIT1

UBC

MIT2

end systems routersIP multicast flow

Chicago

UBC

Calgary

MIT1

MIT2

end systemsoverlay tunnels

IP MulticastOverlay

Two categories of solutions: Based on support from the network: IP-multicast Without network support: application-layer multicast

Page 21: EECE 411: Design of Distributed Software Applications Lecture 6 [Last time] Distributed object systems Java RMI Assignment 2 Garbage collection Data distribution.

EECE 411: Design of Distributed Software Applications

Discussion Deployment if IP-multicast is limited. Why?

Page 22: EECE 411: Design of Distributed Software Applications Lecture 6 [Last time] Distributed object systems Java RMI Assignment 2 Garbage collection Data distribution.

EECE 411: Design of Distributed Software Applications

Application Layer Multicast

Calgary

ChicagoMIT1

UBC

MIT2

end systems routersIP multicast flow

Chicago

UBC

Calgary

MIT1

MIT2

end systemsoverlay tunnels

IP MulticastOverlay

What should be the success metrics?

Page 23: EECE 411: Design of Distributed Software Applications Lecture 6 [Last time] Distributed object systems Java RMI Assignment 2 Garbage collection Data distribution.

EECE 411: Design of Distributed Software Applications

Overheads compared to IP multicast Relative Delay Penalty (RDP): Overlay-delay vs. IP-delay Stress: number of duplicate packets on each physical link

MIT2

Chicago MIT1

UBC

Calg2

Calg1

IP Multicast

MIT2

Chicago MIT1

Calg1

Calg2

UBC

Overlay

Application-level multicast success metrics: Relative Delay Penalty and Link Stress

RDP

0%

20%

40%

60%

80%

100%

0.1 1 10 100Relative delay penalty (RDP)

CD

F .

0%

20%

40%

60%

80%

100%

0 5 10 15 20Link stress

CD

F

Link stress distributionRelative delay penalty distribution

90%-tile RDP Maximum link stress

Page 24: EECE 411: Design of Distributed Software Applications Lecture 6 [Last time] Distributed object systems Java RMI Assignment 2 Garbage collection Data distribution.

EECE 411: Design of Distributed Software Applications

Roadmap …

Data distribution: Multicast Epidemic algorithms

Page 25: EECE 411: Design of Distributed Software Applications Lecture 6 [Last time] Distributed object systems Java RMI Assignment 2 Garbage collection Data distribution.

EECE 411: Design of Distributed Software Applications

Epidemic algorithms: Principle

Basic idea: Assume there are no write–write conflicts:(e.g., update operations are initially performed at one node) A node passes its updated state to a limited number of

‘neighbors’; neighbors, in-turn, pass the update to their neighbors

Update propagation is lazy, i.e., not immediate Eventually, each update should reach every node

Anti-entropy: Each node regularly chooses another node at random, and exchanges state differences, leading to identical states at both afterwards

[Variation] Gossiping: A replica which has just been updated (i.e., has been contaminated), tells a number of other replicas about its update (contaminating them as well).

What are the advantages?

Page 26: EECE 411: Design of Distributed Software Applications Lecture 6 [Last time] Distributed object systems Java RMI Assignment 2 Garbage collection Data distribution.

EECE 411: Design of Distributed Software Applications

Amazon S3 incident on Sunday, July 20th, 2008

Amazon S3 service:

Provides a simple web services interface to store and retrieve any amount of data.

Intends to be highly scalable, reliable, fast, and inexpensive data storage infrastructure…

S3 serves a large number of customers. Amazon itself uses S3 to run its own global network of web sites.

Lots of objects stored 4billion Q4’06 40billion Q4’08 100billion Q2’10

Page 27: EECE 411: Design of Distributed Software Applications Lecture 6 [Last time] Distributed object systems Java RMI Assignment 2 Garbage collection Data distribution.

EECE 411: Design of Distributed Software Applications

Amazon S3 incident on Sunday, July 20th, 2008

8:40am PDT: error rates began to quickly climb

10 min: error rates significantly elevated and very few requests complete successfully

15 min: Multiple engineers investigating the issue. Alarms pointed at problems within the systems and across multiple data centers.

• Trying to restore system health by reducing system load in several stages. No impact.

Page 28: EECE 411: Design of Distributed Software Applications Lecture 6 [Last time] Distributed object systems Java RMI Assignment 2 Garbage collection Data distribution.

EECE 411: Design of Distributed Software Applications

Amazon S3 incident on Sunday, July 20th, 2008

1h01min: engineers detect that servers within Amazon S3 have problems communicating with each other

• Amazon S3 uses a gossip protocol to spread servers’ state info in order to quickly route around failed or unreachable servers

• After, engineers determine that a large number of servers were spending almost all of their time gossiping

1h52min: unable to determine and solve the problem, they decide to shut down all components, clear the system's state, and then reactivate the request processing components.

Restart the system!

Page 29: EECE 411: Design of Distributed Software Applications Lecture 6 [Last time] Distributed object systems Java RMI Assignment 2 Garbage collection Data distribution.

EECE 411: Design of Distributed Software Applications

Amazon S3 incident on Sunday, July 20th, 2008

2h29min: the system's state cleared

5h49min: internal communication restored and began reactivating request processing components in the US and EU.

7h37min: EU was ok and US location began to process requests successfully.

8h33min: Request rates and error rates had returned to normal in US.

Page 30: EECE 411: Design of Distributed Software Applications Lecture 6 [Last time] Distributed object systems Java RMI Assignment 2 Garbage collection Data distribution.

EECE 411: Design of Distributed Software Applications

Post-event investigation

Message corruption was the cause of the server-to-server communication problems

Many messages on Sunday morning had a single bit corrupted

MD5 checksums are used in the system, but Amazon did not apply them to detect errors in this particular internal state

The corruption spread wrong states throughout the system and increased the system load

Page 31: EECE 411: Design of Distributed Software Applications Lecture 6 [Last time] Distributed object systems Java RMI Assignment 2 Garbage collection Data distribution.

EECE 411: Design of Distributed Software Applications

Preventing the problem

Change the gossip algorithm in order to control/reduce the amount of messages. Add rate limiters.

Put additional monitoring and alarming for gossip rates and failures

Add checksums to detect corruption of system state messages

Page 32: EECE 411: Design of Distributed Software Applications Lecture 6 [Last time] Distributed object systems Java RMI Assignment 2 Garbage collection Data distribution.

EECE 411: Design of Distributed Software Applications

Lessons learned

You get a big hammer … use it wisely!

Verify message and state correctness – all kind of corruption errors may occur

An emergency procedure to restore clear state in your system may be the solution of last resort. Make it work quickly!

Lessons

Page 33: EECE 411: Design of Distributed Software Applications Lecture 6 [Last time] Distributed object systems Java RMI Assignment 2 Garbage collection Data distribution.

EECE 411: Design of Distributed Software Applications

Amazon’s the report for the incident http://status.aws.amazon.com/s3-20080720.html

Current status for Amazon services http://status.aws.amazon.com/

Page 34: EECE 411: Design of Distributed Software Applications Lecture 6 [Last time] Distributed object systems Java RMI Assignment 2 Garbage collection Data distribution.

EECE 411: Design of Distributed Software Applications

Back to epidemic communication

Page 35: EECE 411: Design of Distributed Software Applications Lecture 6 [Last time] Distributed object systems Java RMI Assignment 2 Garbage collection Data distribution.

EECE 411: Design of Distributed Software Applications

Anti-Entropy Protocols

A node P selects another node Q from the system at random.

Push: P only sends its updates to Q Pull: P only retrieves updates from Q Push-Pull: P and Q exchange mutual updates (after which

they hold the same information).

Observation: for push-pull it takes O(log(N)) roundsto disseminate updates to all N nodes

one round = every node as taken the initiative to start one exchange.

Main properties: Reliability: a node failures do not impact the protocol Dissemination time & effort, scales well with the number of nodes

Page 36: EECE 411: Design of Distributed Software Applications Lecture 6 [Last time] Distributed object systems Java RMI Assignment 2 Garbage collection Data distribution.

EECE 411: Design of Distributed Software Applications

Gossiping

Basic model: A node S having an update to report, contacts other randomly chosen servers. Termination decision: If the contacted node already has the update S stops

contacting other nodes with probability 1/k.

P the share of nodes that have not been reached

P = e -(k+1)(1-p)

K P1 20.0%2 6.0%4 0.7%

ln(P)

Page 37: EECE 411: Design of Distributed Software Applications Lecture 6 [Last time] Distributed object systems Java RMI Assignment 2 Garbage collection Data distribution.

EECE 411: Design of Distributed Software Applications

Example applications (I)

Data dissemination: in p2p, wireless sensor networks, clusters Spreading updates:

E.g., disconnected replicated list maintenance – Demers et al., Epidemic algorithms for replicated database maintenance. SOSP’87

Membership protocols: e.g., Amazon Dynamo service: DeCandia et. al,

Dynamo: Amazon’s Highly Available Key-value Store, SOSP’07

Various p2p networks (e.g., Tribler)

Page 38: EECE 411: Design of Distributed Software Applications Lecture 6 [Last time] Distributed object systems Java RMI Assignment 2 Garbage collection Data distribution.

EECE 411: Design of Distributed Software Applications

Example applications (II)

Data aggregation The problem: compute the average value

for a large set of sensors Let every node i maintain a variable xi.

When two nodes gossip, they each reset their variable to

xi, xk ←(xi+ xk)/2 Result: in the end each node will have

computed the average avg = sum(xi))/N.

Page 39: EECE 411: Design of Distributed Software Applications Lecture 6 [Last time] Distributed object systems Java RMI Assignment 2 Garbage collection Data distribution.

EECE 411: Design of Distributed Software Applications

Advantages of epidemic techniques

Probabilistic model. Rigorous mathematical underpinnings.

Good framework for reasoning about the spread of information through a system over time.

Asynchronous communication pattern. Operate in a 'fire-and -forget' mode, where, even if the initial sender fails, surviving nodes will receive the update.

Autonomous actions. Enable nodes to take actions based on the data received without the need for additional communication to reach agreement with partners; nodes can take decisions autonomously.

Robust with respect to message loss & node failures. Once a message has been received by at least one of your peers it is almost impossible to prevent the spread of the information through the system.