EECE 411: Design of Distributed Software Applications Lecture 6 [Last time] Distributed object...
-
date post
20-Dec-2015 -
Category
Documents
-
view
223 -
download
0
Transcript of EECE 411: Design of Distributed Software Applications Lecture 6 [Last time] Distributed object...
EECE 411: Design of Distributed Software Applications
Lecture 6 [Last time] Distributed object systems
Java RMI Assignment 2 Garbage collection
Data distribution
EECE 411: Design of Distributed Software Applications
Summary for last time
Push vs. pull design Distributed garbage collection
Solutions much more complex than for non-distributed case
No perfect solution: depending on the assumptions you make on your platform one or the other might offer the best tradeoffs
Lease based approaches (or soft-state): often practical and scalable in distributed environments
EECE 411: Design of Distributed Software Applications
Assignment 2 discussion
Push vs. pull design
Server initiates communication (pushes data) Advantage: possibly lower load on server Drawback: server needs to maintain state (list of
clients) Client initiates communication (pulls data)
Advantage: no client registration needed, server does not maintain data, more flexibility for clients
Drawback: load on server, DoS attacks
EECE 411: Design of Distributed Software Applications
Assignment 2 discussion
Server initiates communication (pushes data) Two subsequent problems:
When to initiate communication (When to push the data)?
Where/How to push it (How to find the clients?)
EECE 411: Design of Distributed Software Applications
Assignment 2 discussion: Chat system using RMI & callbacks
A possible implementation : the server has
a Multicaster object with a method send(String) each client has
a Display object with a method show(String)
both methods are remote.
Clients invoke send and the server invokes show.
Sending a string means showing it on all displays.
EECE 411: Design of Distributed Software Applications
/* a synchronized queue */public class MessageQueue {
/* the actual queue */ private LinkedList _queue ;
/* the constructor - it simply creates the LinkedList to store queue elements*/
public MessageQueue() { _queue = new LinkedList(); }
/* gets the first element of the queue or blocks if the queue is empty*/
public synchronized String dequeue() throws InterruptedException {
while (_queue.isEmpty()) { wait(); } return (String)_queue.removeFirst(); }
/* add a new element to the queue */ public synchronized void enqueue(String m) { _queue.addLast(m); notify(); }}
EECE 411: Design of Distributed Software Applications
public class Main { static GUI gui; static MessageQueue _queue;
public static void main(String[] args) { // create a shared buffer where the GUI adds the messages that need to // be sent out by the main thread. The main thread stays in a loop and
// when a new message shows up in the buffer it sends it out to the server _queue = new MessageQueue();
// instantiate the GUI - in a new thread javax.swing.SwingUtilities.invokeLater(new Runnable() { public void run() { gui = GUI.createAndShowGUI(_queue); } });
// hack: make sure the GUI instantioation is completed by the GUI thread // before the next call while (gui == null) Thread.currentThread().yield(); // calling the GUI method that updates the text area of the GUI // you might want to call the same method when a new chat message arrives gui.addToTextArea("RemoteUser:> Sample of displaying remote maessage"); /// … cont next page
// The code below serves as an example to show how to shares message // between the GUI and the main thread. // You will probably want to replace the code below with code that sits in a loop, // waits for new messages to be entered by the user, and sends them to the // chat server (using an RMI call) // // In addition you may want to add code that // * connects to the chat server and provides an object for callbacks (so // that the server has a way to send messages generated by other users) // * implement the callback object which is called by the server remotely // and, in turn, updates the local GUI while (true) { String s; try { // wait until the user enters a new chat message s = _queue.dequeue(); } catch (InterruptedException ie) { break; } // update the GUI with the message entered by the user gui.addToTextArea("Me:> " + s); // print it to System.out (or send it to the RMI server) System.out.println ("User entered: " + s + " -- now sending it to chat server"); } // end while loop } }
EECE 411: Design of Distributed Software Applications
public static void main(String[] args) {…… CONTUNIED ….
// example to show how to share message between the GUI and the main thread.
// You will probably want to replace the code below with code that sits in a loop,
// waits for new messages to be entered by the user, and sends them to the // chat server
// In addition you may want to add code that: // * connects to the chat server and provides an object for callbacks (so // that the server has a way to send messages generated by other users) // * implement the callback object which is called by the server remotely // and, in turn, updates the local GUI
while (true) { String s; try { // wait until the user enters a new chat message s = _queue.dequeue(); } catch (InterruptedException ie) { break; } // update the GUI with the message entered by the user gui.addToTextArea("Me:> " + s); } // end while loop }}
EECE 411: Design of Distributed Software Applications
Design exercise
Imagine a two-level p2p network (e.g., Skype) Each normal peer registers with one super-peer Super-peers provide additional functionality:
directory search, call routing, etc. There are some central servers (e.g., that support
the www.skype.com domain, register new users, etc).
Skype would like to present on its webpage and estimate of for the number of participating nodes.
Design a protocol.
EECE 411: Design of Distributed Software Applications
Soft-state
Producer sends state to receiver(s) over a (lossy) channel. Receivers keep state and associated timeouts.
Advantages: Decuples state producer and consumer: no explicit
failure detection and state removal messages ‘Eventual’ state
Works well in practice: RSVP, RIP, tons of other systems.
State producer
State consumer
EECE 411: Design of Distributed Software Applications
Garbage collection in single box systems
Solutions Reference counting Tracing based solutions (mark and sweep)
EECE 411: Design of Distributed Software Applications
Garbage collection in distributed systems Why is it different?
References distributed across multiple address spaces
Why a solution may be hard to design: Unreliable communication Unannounced failures Overheads
EECE 411: Design of Distributed Software Applications
Reference Counting
The problem: maintaining a proper reference count in the presence of unreliable communication.
Key: ability to detect duplicate messages [A note on terminology: for the next few slides I’ll
use proxy for client stub and skeleton for server stub.]
EECE 411: Design of Distributed Software Applications
Reference Counting (cont)Passing remote object references
a) Copy the reference and let the destination increment the counter• Problems?
• What if P1 deletes its reference before P2 increments the counter
b) Signal the copy first to the server• Problems?
• Overheads, Coupling (what if P2 fails?)
EECE 411: Design of Distributed Software Applications
Advanced Solutions Weighted Reference Counting
a) Initial assignment of weights (lifes)b) New weight (life) assignment when creating a new reference.
EECE 411: Design of Distributed Software Applications
Advanced Solutions: Weighted Reference Counting (II)
Weight (life) assignment when copying a reference.
Pros/cons? + Create new references without contacting the
server! - Client machine failures
EECE 411: Design of Distributed Software Applications
Reference Listing (Java RMI’s solution)
Skeleton maintains a list of client proxies Creating a remote reference
Assume P attempts to create remote reference to O P sends its identification to O skeleton O acknowledges and stores P identity P creates the proxy
Copying a remote reference (P1 attempts to pass to P2 a remote reference to O)
Advantages: add/delete are idempotent
i.e. duplicate operations have no effect no reliable communication required
Drawback overheads/scalability – the list of proxies can grow large handling unanounced client failures (may lead to
resource leak)
EECE 411: Design of Distributed Software Applications
Reference Listing (Java RMI’s solution) Handling failures
Handling failures Lease based approach:
Skeleton promises to keep info on client only for limited time.
If info not renewed then the skeleton discards it.
Pros/Cons?
EECE 411: Design of Distributed Software Applications
Distributed system: collection of independent components that appears to its users as a single coherent system
Components need to communicate Shared memory Message exchange
So far we talked about point-to-point, (generally synchronous, non-persistent) communication
Socket programming: Message based, generally synchronous, non-persistent
Client-server infrastructures RPC, RMI
Data distribution: Multicast Epidemic algorithms
Roadmap
EECE 411: Design of Distributed Software Applications
Multicast Communication
Calgary
ChicagoMIT1
UBC
MIT2
end systems routersIP multicast flow
Chicago
UBC
Calgary
MIT1
MIT2
end systemsoverlay tunnels
IP MulticastOverlay
Two categories of solutions: Based on support from the network: IP-multicast Without network support: application-layer multicast
EECE 411: Design of Distributed Software Applications
Discussion Deployment if IP-multicast is limited. Why?
EECE 411: Design of Distributed Software Applications
Application Layer Multicast
Calgary
ChicagoMIT1
UBC
MIT2
end systems routersIP multicast flow
Chicago
UBC
Calgary
MIT1
MIT2
end systemsoverlay tunnels
IP MulticastOverlay
What should be the success metrics?
EECE 411: Design of Distributed Software Applications
Overheads compared to IP multicast Relative Delay Penalty (RDP): Overlay-delay vs. IP-delay Stress: number of duplicate packets on each physical link
MIT2
Chicago MIT1
UBC
Calg2
Calg1
IP Multicast
MIT2
Chicago MIT1
Calg1
Calg2
UBC
Overlay
Application-level multicast success metrics: Relative Delay Penalty and Link Stress
RDP
0%
20%
40%
60%
80%
100%
0.1 1 10 100Relative delay penalty (RDP)
CD
F .
0%
20%
40%
60%
80%
100%
0 5 10 15 20Link stress
CD
F
Link stress distributionRelative delay penalty distribution
90%-tile RDP Maximum link stress
EECE 411: Design of Distributed Software Applications
Roadmap …
Data distribution: Multicast Epidemic algorithms
EECE 411: Design of Distributed Software Applications
Epidemic algorithms: Principle
Basic idea: Assume there are no write–write conflicts:(e.g., update operations are initially performed at one node) A node passes its updated state to a limited number of
‘neighbors’; neighbors, in-turn, pass the update to their neighbors
Update propagation is lazy, i.e., not immediate Eventually, each update should reach every node
Anti-entropy: Each node regularly chooses another node at random, and exchanges state differences, leading to identical states at both afterwards
[Variation] Gossiping: A replica which has just been updated (i.e., has been contaminated), tells a number of other replicas about its update (contaminating them as well).
What are the advantages?
EECE 411: Design of Distributed Software Applications
Amazon S3 incident on Sunday, July 20th, 2008
Amazon S3 service:
Provides a simple web services interface to store and retrieve any amount of data.
Intends to be highly scalable, reliable, fast, and inexpensive data storage infrastructure…
S3 serves a large number of customers. Amazon itself uses S3 to run its own global network of web sites.
Lots of objects stored 4billion Q4’06 40billion Q4’08 100billion Q2’10
EECE 411: Design of Distributed Software Applications
Amazon S3 incident on Sunday, July 20th, 2008
8:40am PDT: error rates began to quickly climb
10 min: error rates significantly elevated and very few requests complete successfully
15 min: Multiple engineers investigating the issue. Alarms pointed at problems within the systems and across multiple data centers.
• Trying to restore system health by reducing system load in several stages. No impact.
EECE 411: Design of Distributed Software Applications
Amazon S3 incident on Sunday, July 20th, 2008
1h01min: engineers detect that servers within Amazon S3 have problems communicating with each other
• Amazon S3 uses a gossip protocol to spread servers’ state info in order to quickly route around failed or unreachable servers
• After, engineers determine that a large number of servers were spending almost all of their time gossiping
1h52min: unable to determine and solve the problem, they decide to shut down all components, clear the system's state, and then reactivate the request processing components.
Restart the system!
EECE 411: Design of Distributed Software Applications
Amazon S3 incident on Sunday, July 20th, 2008
2h29min: the system's state cleared
5h49min: internal communication restored and began reactivating request processing components in the US and EU.
7h37min: EU was ok and US location began to process requests successfully.
8h33min: Request rates and error rates had returned to normal in US.
EECE 411: Design of Distributed Software Applications
Post-event investigation
Message corruption was the cause of the server-to-server communication problems
Many messages on Sunday morning had a single bit corrupted
MD5 checksums are used in the system, but Amazon did not apply them to detect errors in this particular internal state
The corruption spread wrong states throughout the system and increased the system load
EECE 411: Design of Distributed Software Applications
Preventing the problem
Change the gossip algorithm in order to control/reduce the amount of messages. Add rate limiters.
Put additional monitoring and alarming for gossip rates and failures
Add checksums to detect corruption of system state messages
EECE 411: Design of Distributed Software Applications
Lessons learned
You get a big hammer … use it wisely!
Verify message and state correctness – all kind of corruption errors may occur
An emergency procedure to restore clear state in your system may be the solution of last resort. Make it work quickly!
Lessons
EECE 411: Design of Distributed Software Applications
Amazon’s the report for the incident http://status.aws.amazon.com/s3-20080720.html
Current status for Amazon services http://status.aws.amazon.com/
EECE 411: Design of Distributed Software Applications
Back to epidemic communication
EECE 411: Design of Distributed Software Applications
Anti-Entropy Protocols
A node P selects another node Q from the system at random.
Push: P only sends its updates to Q Pull: P only retrieves updates from Q Push-Pull: P and Q exchange mutual updates (after which
they hold the same information).
Observation: for push-pull it takes O(log(N)) roundsto disseminate updates to all N nodes
one round = every node as taken the initiative to start one exchange.
Main properties: Reliability: a node failures do not impact the protocol Dissemination time & effort, scales well with the number of nodes
EECE 411: Design of Distributed Software Applications
Gossiping
Basic model: A node S having an update to report, contacts other randomly chosen servers. Termination decision: If the contacted node already has the update S stops
contacting other nodes with probability 1/k.
P the share of nodes that have not been reached
P = e -(k+1)(1-p)
K P1 20.0%2 6.0%4 0.7%
ln(P)
EECE 411: Design of Distributed Software Applications
Example applications (I)
Data dissemination: in p2p, wireless sensor networks, clusters Spreading updates:
E.g., disconnected replicated list maintenance – Demers et al., Epidemic algorithms for replicated database maintenance. SOSP’87
Membership protocols: e.g., Amazon Dynamo service: DeCandia et. al,
Dynamo: Amazon’s Highly Available Key-value Store, SOSP’07
Various p2p networks (e.g., Tribler)
EECE 411: Design of Distributed Software Applications
Example applications (II)
Data aggregation The problem: compute the average value
for a large set of sensors Let every node i maintain a variable xi.
When two nodes gossip, they each reset their variable to
xi, xk ←(xi+ xk)/2 Result: in the end each node will have
computed the average avg = sum(xi))/N.
EECE 411: Design of Distributed Software Applications
Advantages of epidemic techniques
Probabilistic model. Rigorous mathematical underpinnings.
Good framework for reasoning about the spread of information through a system over time.
Asynchronous communication pattern. Operate in a 'fire-and -forget' mode, where, even if the initial sender fails, surviving nodes will receive the update.
Autonomous actions. Enable nodes to take actions based on the data received without the need for additional communication to reach agreement with partners; nodes can take decisions autonomously.
Robust with respect to message loss & node failures. Once a message has been received by at least one of your peers it is almost impossible to prevent the spread of the information through the system.