EECE 411: Design of Distributed Software Applications Lecture 6 [Last time] Distributed object...

EECE 411: Design of Distributed Software Applications

Lecture 6 [Last time] Distributed object systems

Java RMI Assignment 2 Garbage collection

Data distribution


Summary for last time

Push vs. pull design Distributed garbage collection

Solutions much more complex than for non-distributed case

No perfect solution: depending on the assumptions you make on your platform one or the other might offer the best tradeoffs

Lease based approaches (or soft-state): often practical and scalable in distributed environments


Assignment 2 discussion

Push vs. pull design

Server initiates communication (pushes data) Advantage: possibly lower load on server Drawback: server needs to maintain state (list of

clients) Client initiates communication (pulls data)

Advantage: no client registration needed, server does not maintain data, more flexibility for clients

Drawback: load on server, DoS attacks


Assignment 2 discussion

Server initiates communication (pushes data) Two subsequent problems:

When to initiate communication (When to push the data)?

Where/How to push it (How to find the clients?)


Assignment 2 discussion: Chat system using RMI & callbacks

A possible implementation : the server has

a Multicaster object with a method send(String) each client has

a Display object with a method show(String)

both methods are remote.

Clients invoke send and the server invokes show.

Sending a string means showing it on all displays.


/* a synchronized queue */public class MessageQueue {

/* the actual queue */ private LinkedList _queue ;

/* the constructor - it simply creates the LinkedList to store queue elements*/

public MessageQueue() { _queue = new LinkedList(); }

/* gets the first element of the queue or blocks if the queue is empty*/

public synchronized String dequeue() throws InterruptedException {

while (_queue.isEmpty()) { wait(); } return (String)_queue.removeFirst(); }

/* add a new element to the queue */ public synchronized void enqueue(String m) { _queue.addLast(m); notify(); }}


public class Main { static GUI gui; static MessageQueue _queue;

public static void main(String[] args) { // create a shared buffer where the GUI adds the messages that need to // be sent out by the main thread. The main thread stays in a loop and

// when a new message shows up in the buffer it sends it out to the server _queue = new MessageQueue();

// instantiate the GUI - in a new thread javax.swing.SwingUtilities.invokeLater(new Runnable() { public void run() { gui = GUI.createAndShowGUI(_queue); } });

// hack: make sure the GUI instantioation is completed by the GUI thread // before the next call while (gui == null) Thread.currentThread().yield(); // calling the GUI method that updates the text area of the GUI // you might want to call the same method when a new chat message arrives gui.addToTextArea("RemoteUser:> Sample of displaying remote maessage"); /// … cont next page

// The code below serves as an example to show how to shares message // between the GUI and the main thread. // You will probably want to replace the code below with code that sits in a loop, // waits for new messages to be entered by the user, and sends them to the // chat server (using an RMI call) // // In addition you may want to add code that // * connects to the chat server and provides an object for callbacks (so // that the server has a way to send messages generated by other users) // * implement the callback object which is called by the server remotely // and, in turn, updates the local GUI while (true) { String s; try { // wait until the user enters a new chat message s = _queue.dequeue(); } catch (InterruptedException ie) { break; } // update the GUI with the message entered by the user gui.addToTextArea("Me:> " + s); // print it to System.out (or send it to the RMI server) System.out.println ("User entered: " + s + " -- now sending it to chat server"); } // end while loop } }


public static void main(String[] args) {…… CONTUNIED ….

// example to show how to share message between the GUI and the main thread.

// You will probably want to replace the code below with code that sits in a loop,

// waits for new messages to be entered by the user, and sends them to the // chat server

// In addition you may want to add code that: // * connects to the chat server and provides an object for callbacks (so // that the server has a way to send messages generated by other users) // * implement the callback object which is called by the server remotely // and, in turn, updates the local GUI

while (true) { String s; try { // wait until the user enters a new chat message s = _queue.dequeue(); } catch (InterruptedException ie) { break; } // update the GUI with the message entered by the user gui.addToTextArea("Me:> " + s); } // end while loop }}


Design exercise

Imagine a two-level p2p network (e.g., Skype) Each normal peer registers with one super-peer Super-peers provide additional functionality:

directory search, call routing, etc. There are some central servers (e.g., that support

the www.skype.com domain, register new users, etc).

Skype would like to present on its webpage and estimate of for the number of participating nodes.

Design a protocol.


Soft-state

Producer sends state to receiver(s) over a (lossy) channel. Receivers keep state and associated timeouts.

Advantages: Decuples state producer and consumer: no explicit

failure detection and state removal messages ‘Eventual’ state

Works well in practice: RSVP, RIP, tons of other systems.

State producer

State consumer


Garbage collection in single box systems

Solutions Reference counting Tracing based solutions (mark and sweep)


Garbage collection in distributed systems Why is it different?

References distributed across multiple address spaces

Why a solution may be hard to design: Unreliable communication Unannounced failures Overheads


Reference Counting

The problem: maintaining a proper reference count in the presence of unreliable communication.

Key: ability to detect duplicate messages [A note on terminology: for the next few slides I’ll

use proxy for client stub and skeleton for server stub.]


Reference Counting (cont)Passing remote object references

a) Copy the reference and let the destination increment the counter• Problems?

• What if P1 deletes its reference before P2 increments the counter

b) Signal the copy first to the server• Problems?

• Overheads, Coupling (what if P2 fails?)


Advanced Solutions Weighted Reference Counting

a) Initial assignment of weights (lifes)b) New weight (life) assignment when creating a new reference.


Advanced Solutions: Weighted Reference Counting (II)

Weight (life) assignment when copying a reference.

Pros/cons? + Create new references without contacting the

server! - Client machine failures


Reference Listing (Java RMI’s solution)

Skeleton maintains a list of client proxies Creating a remote reference

Assume P attempts to create remote reference to O P sends its identification to O skeleton O acknowledges and stores P identity P creates the proxy

Copying a remote reference (P1 attempts to pass to P2 a remote reference to O)

Advantages: add/delete are idempotent

i.e. duplicate operations have no effect no reliable communication required

Drawback overheads/scalability – the list of proxies can grow large handling unanounced client failures (may lead to

resource leak)


Reference Listing (Java RMI’s solution) Handling failures

Handling failures Lease based approach:

Skeleton promises to keep info on client only for limited time.

If info not renewed then the skeleton discards it.

Pros/Cons?


Distributed system: collection of independent components that appears to its users as a single coherent system

Components need to communicate Shared memory Message exchange

So far we talked about point-to-point, (generally synchronous, non-persistent) communication

Socket programming: Message based, generally synchronous, non-persistent

Client-server infrastructures RPC, RMI

Data distribution: Multicast Epidemic algorithms

Roadmap


Multicast Communication

Calgary

ChicagoMIT1

UBC

MIT2

end systems routersIP multicast flow

Chicago

UBC

Calgary

MIT1

MIT2

end systemsoverlay tunnels

IP MulticastOverlay

Two categories of solutions: Based on support from the network: IP-multicast Without network support: application-layer multicast


Discussion Deployment if IP-multicast is limited. Why?


Application Layer Multicast

Calgary

ChicagoMIT1

UBC

MIT2

end systems routersIP multicast flow

Chicago

UBC

Calgary

MIT1

MIT2

end systemsoverlay tunnels

IP MulticastOverlay

What should be the success metrics?


Overheads compared to IP multicast Relative Delay Penalty (RDP): Overlay-delay vs. IP-delay Stress: number of duplicate packets on each physical link

MIT2

Chicago MIT1

UBC

Calg2

Calg1

IP Multicast

MIT2

Chicago MIT1

Calg1

Calg2

UBC

Overlay

Application-level multicast success metrics: Relative Delay Penalty and Link Stress

RDP

0%

20%

40%

60%

80%

100%

0.1 1 10 100Relative delay penalty (RDP)

CD

F .

0%

20%

40%

60%

80%

100%

0 5 10 15 20Link stress

CD

F

Link stress distributionRelative delay penalty distribution

90%-tile RDP Maximum link stress


Roadmap …

Data distribution: Multicast Epidemic algorithms


Epidemic algorithms: Principle

Basic idea: Assume there are no write–write conflicts:(e.g., update operations are initially performed at one node) A node passes its updated state to a limited number of

‘neighbors’; neighbors, in-turn, pass the update to their neighbors

Update propagation is lazy, i.e., not immediate Eventually, each update should reach every node

Anti-entropy: Each node regularly chooses another node at random, and exchanges state differences, leading to identical states at both afterwards

[Variation] Gossiping: A replica which has just been updated (i.e., has been contaminated), tells a number of other replicas about its update (contaminating them as well).

What are the advantages?


Amazon S3 incident on Sunday, July 20th, 2008

Amazon S3 service:

Provides a simple web services interface to store and retrieve any amount of data.

Intends to be highly scalable, reliable, fast, and inexpensive data storage infrastructure…

S3 serves a large number of customers. Amazon itself uses S3 to run its own global network of web sites.

Lots of objects stored 4billion Q4’06 40billion Q4’08 100billion Q2’10



8:40am PDT: error rates began to quickly climb

10 min: error rates significantly elevated and very few requests complete successfully

15 min: Multiple engineers investigating the issue. Alarms pointed at problems within the systems and across multiple data centers.

• Trying to restore system health by reducing system load in several stages. No impact.



1h01min: engineers detect that servers within Amazon S3 have problems communicating with each other

• Amazon S3 uses a gossip protocol to spread servers’ state info in order to quickly route around failed or unreachable servers

• After, engineers determine that a large number of servers were spending almost all of their time gossiping

1h52min: unable to determine and solve the problem, they decide to shut down all components, clear the system's state, and then reactivate the request processing components.

Restart the system!



2h29min: the system's state cleared

5h49min: internal communication restored and began reactivating request processing components in the US and EU.

7h37min: EU was ok and US location began to process requests successfully.

8h33min: Request rates and error rates had returned to normal in US.


Post-event investigation

Message corruption was the cause of the server-to-server communication problems

Many messages on Sunday morning had a single bit corrupted

MD5 checksums are used in the system, but Amazon did not apply them to detect errors in this particular internal state

The corruption spread wrong states throughout the system and increased the system load


Preventing the problem

Change the gossip algorithm in order to control/reduce the amount of messages. Add rate limiters.

Put additional monitoring and alarming for gossip rates and failures

Add checksums to detect corruption of system state messages


Lessons learned

You get a big hammer … use it wisely!

Verify message and state correctness – all kind of corruption errors may occur

An emergency procedure to restore clear state in your system may be the solution of last resort. Make it work quickly!

Lessons


Amazon’s the report for the incident http://status.aws.amazon.com/s3-20080720.html

Current status for Amazon services http://status.aws.amazon.com/


Back to epidemic communication


Anti-Entropy Protocols

A node P selects another node Q from the system at random.

Push: P only sends its updates to Q Pull: P only retrieves updates from Q Push-Pull: P and Q exchange mutual updates (after which

they hold the same information).

Observation: for push-pull it takes O(log(N)) roundsto disseminate updates to all N nodes

one round = every node as taken the initiative to start one exchange.

Main properties: Reliability: a node failures do not impact the protocol Dissemination time & effort, scales well with the number of nodes


Gossiping

Basic model: A node S having an update to report, contacts other randomly chosen servers. Termination decision: If the contacted node already has the update S stops

contacting other nodes with probability 1/k.

P the share of nodes that have not been reached

P = e -(k+1)(1-p)

K P1 20.0%2 6.0%4 0.7%

ln(P)


Example applications (I)

Data dissemination: in p2p, wireless sensor networks, clusters Spreading updates:

E.g., disconnected replicated list maintenance – Demers et al., Epidemic algorithms for replicated database maintenance. SOSP’87

Membership protocols: e.g., Amazon Dynamo service: DeCandia et. al,

Dynamo: Amazon’s Highly Available Key-value Store, SOSP’07

Various p2p networks (e.g., Tribler)


Example applications (II)

Data aggregation The problem: compute the average value

for a large set of sensors Let every node i maintain a variable xi.

When two nodes gossip, they each reset their variable to

xi, xk ←(xi+ xk)/2 Result: in the end each node will have

computed the average avg = sum(xi))/N.


Advantages of epidemic techniques

Probabilistic model. Rigorous mathematical underpinnings.

Good framework for reasoning about the spread of information through a system over time.

Asynchronous communication pattern. Operate in a 'fire-and -forget' mode, where, even if the initial sender fails, surviving nodes will receive the update.

Autonomous actions. Enable nodes to take actions based on the data received without the need for additional communication to reach agreement with partners; nodes can take decisions autonomously.

Robust with respect to message loss & node failures. Once a message has been received by at least one of your peers it is almost impossible to prevent the spread of the information through the system.

EECE 411: Design of Distributed Software Applications Lecture 6 [Last time] Distributed object...

Documents

Transcript of EECE 411: Design of Distributed Software Applications Lecture 6 [Last time] Distributed object...