1 © 2002-2003 Hein Meling and Alberto Montresor The Jgroup/ARM Dependable Computing Toolkit Hein...
-
Upload
imogen-moore -
Category
Documents
-
view
220 -
download
0
Transcript of 1 © 2002-2003 Hein Meling and Alberto Montresor The Jgroup/ARM Dependable Computing Toolkit Hein...
1© 2002-2003 Hein Meling and Alberto Montresor
The Jgroup/ARMDependable Computing Toolkit
Hein MelingStavanger University College – Norway
Department of Electrical and Computer Engineering
Alberto MontresorUniversity of Bologna - Italy
Department of Computer Science
2© 2002-2003 Hein Meling and Alberto Montresor
Context
(Distributed) systems that require
• Reliable and high-availability operation
• Fault tolerance
• (Load balancing)
Based on ”cheap” hardware and software
• Commercial off the shelf, and not custom hardware
• Heterogenous software (OS) architectures
Middleware architectures for distributed computing
• Middleware: between the application and OS
3© 2002-2003 Hein Meling and Alberto Montresor
Types of Failures
Processor failures
• Crash failures
• Value failures (very expensive)
Network failures
Operating System hangs
Memory leaks
Software design errors(beyond state-of-the-art)
4© 2002-2003 Hein Meling and Alberto Montresor
Overview
Jgroup
• A toolkit aimed at supporting the development of reliable and highly-available applications.
Autonomous Replication Management (ARM)
• A framework for server replica deployment and recovery without user intervention.
History
• Formal specification (1996-97)
• Algorithm description and Jgroup implementation
• Integration with existing technologies (Java RMI / Jini)
• The ARM framework (2000-03)
• Development of Jgroup-based applications
5© 2002-2003 Hein Meling and Alberto Montresor
Summary
1. Introduction
2. Object Group Communication
3. The ARM framework
4. Integration with Java RMI / Jini
5. Conclusions
6© 2002-2003 Hein Meling and Alberto Montresor
The Problem
Some environments supporting distributed computing:
• CORBA (OMG)
• DCOM / .NET (Microsoft)
• Java RMI / Jini / EJB (Sun)
Characteristics:
• Object-oriented
• Based on client - server remote method invocations
• Promote modularity, reusability, interoperability, portability
7© 2002-2003 Hein Meling and Alberto Montresor
Java Remote Method Invocations
Java RMI protocol:
• enables objects residing in different JVMs to communicate through remote method invocations
Client Server
Stub
Server-sideRMI
Runtime
Network
JVM1 JVM2
method() return x
8© 2002-2003 Hein Meling and Alberto Montresor
Java Remote Method Invocations
Client Server
JVM1 JVM2
method()
return x
9© 2002-2003 Hein Meling and Alberto Montresor
The Problem
Distributed computing environments did not provide adequate support for developing reliable and high-available applications
Lack of reliable “one-to-many” interaction primitives
• From the client’s point of view: non-transparent access to replicated servers
• From the server’s point of view: no support for maintaining consistency
10© 2002-2003 Hein Meling and Alberto Montresor
The Solution: The Object Group Paradigm
Object group:
• A dynamic collection of server objects that cooperate in order to deliver some service and maintain shared state
Group method invocations:
• The act of invoking a method on an object group
• The method is executed by a certain number of servers in the object group, depending on the invocation semantics
Client
Server Server
Server
ObjectGroup
11© 2002-2003 Hein Meling and Alberto Montresor
The Solution: The Object Group Paradigm
From the client’s point of view:
• Groups must be transparent - like standard remote objects
• Clients need not be aware that they are interacting with an object group instead of a single server
From the server’s point of view:
• Server implementation - as transparent as possible
• Servers forming a group• must cooperate to maintain shared state and• to appear as a single object
12© 2002-2003 Hein Meling and Alberto Montresor
Group Communication
Group communication has been shown to be a powerful paradigm for supporting the development of dependable applications in distributed systems
• Management of dynamic groups(join/leave operations)
• Failure monitoring(crashes / partitionings)
• “One-to-many” communication
• Ordering of events (FIFO, Causal, Atomic)
• State synchronization tools
Group MembershipService
Reliable MulticastService
State TransferService
13© 2002-2003 Hein Meling and Alberto Montresor
Other Object Group Systems
CORBA
• Electra [Cornell, Zurich]
• Object Group Service (OGS) [EPFL, Lausanne]
• Eternal [UC Santa Barbara, Eternal Systems]
• Newtop [Newcastle, UK]
Java RMI
• Filterfresh [Bell Labs]
• JavaGroups [Cornell]
• Aroma [UC Santa Barbara]
DCOM
• Quintet [Cornell]
14© 2002-2003 Hein Meling and Alberto Montresor
Jgroup: “Yet Another Object Group Service”?
Support for partition-awareness:
• Modern wide-area communication networks are often characterized as highly partitionable
• Jgroup supports the development of reliable and high-available applications in partitionable systems
Moreover:
• Is extends modern technologies like Java RMI and Jini
• Is completely written in Java (portability)
• Supports complex merging service
• Extensible: deployment, recovery and upgrade facilities
15© 2002-2003 Hein Meling and Alberto Montresor
Autonomous Replication Management
Support for transparent replica deployment
• Placing server replicas on machines in the network
• Selecting machines so that each application can tolerate both network and machine failures
Support for replica recovery
• Jgroup detect and report failures
• ARM replace any crashed server replica with a new instance
16© 2002-2003 Hein Meling and Alberto Montresor
Summary
1. Introduction
2. Object Group Communication
3. The ARM framework
4. Integration with Java RMI / Jini
5. Conclusions
17© 2002-2003 Hein Meling and Alberto Montresor
Group Membership
Group membership service tracks both voluntary and involuntary changes in the group’s membership
Variations are reported to group members through the installation of views
Installed views
• Consist of a collection of members
• Correspond to the group’s current membership as perceived by the members included in the view
18© 2002-2003 Hein Meling and Alberto Montresor
Group Membership: A Simple Scenario
join
join
joinS1
S2
S3
S3 crashes!view
19© 2002-2003 Hein Meling and Alberto Montresor
Partition-awareness
What kind of behavior can we expect from fault-tolerant applications in the presence of network partitioning?
The primary-partition approach:
No serviceavailable !
How can I help You ?
No serviceavailable !
20© 2002-2003 Hein Meling and Alberto Montresor
Jgroup supports dependability in partitionable systems
• Development of applications aware of the existence of partitions (on the server-side)
• Partition-aware applications take advantage of their semantics in order to be more available
• Computations continue in all partitions of the system
How can I help You ?
How can I help You ?
How can I help You ?
Support for partition-awareness
21© 2002-2003 Hein Meling and Alberto Montresor
Group Membership: A Partitioning Scenario
join
join
joinS1
S2
S3
S1 and S2 partitioned
from S3!
Communicationwith S3
restored!
22© 2002-2003 Hein Meling and Alberto Montresor
Example: Task Execution Service
Server Server
Server
Primary Partition
Server
Client TaskTask
TaskTask
Client
Warning!
Server Server
Server Server
Client TaskTask
TaskTask
ClientTaskTask
TaskTask
Partition-aware
23© 2002-2003 Hein Meling and Alberto Montresor
Comparison
Primary-partition approach
+ Easy to maintain a single, coherent shared state(strong consistency)
- Servers in non-primary partitions unable to serve requests (low availability)
Partition-aware approach
+ Servers in multiple partitions may be able to serve requests(high availability)
- Partitions evolve independently, possibly leading to inconsistent states (loose consistency)
24© 2002-2003 Hein Meling and Alberto Montresor
Comparison (Cont.)
Primary-partition approach
+ Development of fault-tolerant applications is simpler(active replication of existing non fault-tolerant servers)
- Developers cannot exploit application semantics in order to provide a more available service
Partition-aware approach
+ Applications adapt their behavior and remain available in many partitions (perhaps by reducing their quality of service)
- Development of fault-tolerant applications is more complex (case-by-case design is needed)
25© 2002-2003 Hein Meling and Alberto Montresor
The State Merging Problem
During partitioning, the state of servers belonging to distinct partitions may become inconsistent
When the partition disappears, an application-specific state merging protocol may be needed
Servers participating in the protocol try to define a new shared state that reconciles (when possible) the divergences
Server Server
Server ServerTaskTask
TaskTaskServer Server
Server ServerTaskTask
TaskTask
26© 2002-2003 Hein Meling and Alberto Montresor
The State Merging Problem
During partitioning, the state of servers belonging to distinct partitions may become inconsistent
When the partition disappears, an application-specific state merging protocol may be needed
Servers participating in the protocol try to define a new shared state that reconciles (when possible) the divergences
Server Server
Server ServerTaskTask
TaskTaskServer Server
Server ServerTaskTask
TaskTask
TaskTask
TaskTask
27© 2002-2003 Hein Meling and Alberto Montresor
The State Merging Problem
State merging protocols are based on the exchange of information among servers that have been partitioned
Jgroup provides a state merging service (SMS) that simplifies the development of state merging protocols
NOTE
Determining
• what information needs to be exchanged
• how to use it to construct a new consistent shared state
is an application-dependent problem
28© 2002-2003 Hein Meling and Alberto Montresor
General Schema for State Merging Protocols
• In each of the merging partitions, a coordinator is selected
• SMS interrogates each coordinator to obtain information about its current state
• State information from a coordinator is passed to servers that used to be partitioned from it
• Each of the servers merge information from coordinators with their own state
S1
S2
S3
S4
getState()
putState()
29© 2002-2003 Hein Meling and Alberto Montresor
General Schema for State Merging Protocols
• In each of the merging partitions, a coordinator is selected
• SMS interrogates each coordinator to obtain information about its current state
• State information from a coordinator is passed to servers that used to be partitioned from it
• Each of the servers merge information from coordinators with their own state
S1
S2
S3
S4
getState()
putState()
30© 2002-2003 Hein Meling and Alberto Montresor
Full Object-Orientation
Server Server
Server
Client
Remote methodinvocations
Messagemulticasting
Stub
Existing object group systems fail to provide a completely object-oriented environment for software developers
31© 2002-2003 Hein Meling and Alberto Montresor
View Synchrony
View synchrony (1)
If a correct server S executes an invocation during a view, then
• all servers within the view will also execute the invocation,
• or S will install a new view
View synchrony does not admit executions like this:
S2
S3
S4
S1
admits
32© 2002-2003 Hein Meling and Alberto Montresor
View Synchrony
View Synchrony (2)
All servers that survive from one view to the same next view execute the same set of invocations in the original view
View synchrony does not admit executions like this:
S2
S3
S4
S1
admits
33© 2002-2003 Hein Meling and Alberto Montresor
Internal Group Method Invocations
Synchronous invocations
• The method invocation terminates by returning a vector of return values, one from each server at which the method was executed
Asynchronous invocations:
• The method invocation terminates immediately; replies (if any) are returned to a callback object
• Can be used to simulate message multicasting through void methods (one-way)
34© 2002-2003 Hein Meling and Alberto Montresor
Internal Invocations: example
Synchronous invocation
S1
S2
S3
int[] values =
group.getValue();
int getValue() {
return value
}
35© 2002-2003 Hein Meling and Alberto Montresor
Internal Invocations: example
S1
S2
S3
ValuesCallback cb;group.getValue(cb);…int[] values = cb.getResults();
public class ValuesCallback implements Callback { void result(Object value); int[] getResults();}
int getValue() {
return value
}
36© 2002-2003 Hein Meling and Alberto Montresor
External Group Method Invocations
Anycast invocations:
• Are executed by at least one server in the object group (unless the client is partitioned from the group)
• Efficiency (same cost as standard RMI interactions)
• Useful for “read” methods on replicated databases
Multicast invocations:
• Are executed by all servers in a view, following the view synchrony semantics
• More costly (involve several servers)
• Useful for “write” methods on replicated databases
37© 2002-2003 Hein Meling and Alberto Montresor
External invocations: example
S1
S2
S3
C1
C2
Multicast invocation:
registry.bind(“name”, obj);
Anycast invocation:
registry.lookup(“name”);
38© 2002-2003 Hein Meling and Alberto Montresor
Summary
1. Introduction
2. Object Group Communication
3. The ARM framework
4. Integration with Java RMI / Jini
5. Conclusions
39© 2002-2003 Hein Meling and Alberto Montresor
Replication Management – The Problem
Object Group Systems support replication transparency:
• Membership management
• Reliable multicast
But does not support full failure transparency:
• Application or manual support to distribute replicas
• Application support or manual intervention required to recover from replica failures
Complicated tasks
• Application implementations prone to contain errors
• These tasks should not be left to the application developer
40© 2002-2003 Hein Meling and Alberto Montresor
Solution: Autonomous Replication Management
Support for creating object groups
• By placing individual members on distinct machines
• Each application may specify a replication policy• For example, redundancy level = 3
Support for failure recovery
• Jgroup detects and reports failures to ARM
• ARM reacts by creating a replacement member for each failed member, perhaps on a different machine
• Each application may specify a recovery policy
41© 2002-2003 Hein Meling and Alberto Montresor
ARM: Replica Distribution
ExecDaemonExecDaemon ExecDaemon ExecDaemonExecDaemonExecDaemon
ExecDaemonExecDaemonExecDaemon
ExecDaemonExecDaemonExecDaemon ExecDaemonExecDaemon
Router
ux.his.no
item.ntnu.no
ReplicationManager
ReplicationManager
ReplicationManager
ManagementClient
createGroup()createReplica()
42© 2002-2003 Hein Meling and Alberto Montresor
ARM: Replica Distribution
ExecDaemonExecDaemon ExecDaemon ExecDaemonExecDaemonExecDaemon
ExecDaemonExecDaemonExecDaemon
ExecDaemonExecDaemonExecDaemon ExecDaemonExecDaemon
Router
ux.his.no
item.ntnu.no
ReplicationManager
ReplicationManager
ManagementClient
createGroup()createReplica()
NettBankServer
NettBankServer
NettBankServer
43© 2002-2003 Hein Meling and Alberto Montresor
ARM: Recovery from Crash Failure
ExecDaemonExecDaemon ExecDaemon ExecDaemonExecDaemonExecDaemon
ExecDaemonExecDaemonExecDaemon
ExecDaemonExecDaemonExecDaemon ExecDaemonExecDaemon
Router
ux.his.no
item.ntnu.no
ReplicationManager
ReplicationManager
ManagementClient
NettBankServer
NettBankServer
NettBankServer
Group Leader
notifyViewChange()
View agreement protocol
44© 2002-2003 Hein Meling and Alberto Montresor
ARM: Recovery from Crash Failure
ExecDaemonExecDaemon ExecDaemon ExecDaemonExecDaemonExecDaemon
ExecDaemonExecDaemon
ExecDaemonExecDaemonExecDaemon ExecDaemonExecDaemon
Router
ux.his.no
item.ntnu.no
ReplicationManager
ReplicationManager
ManagementClient
NettBankServer
NettBankServer
Group Leader
notifyViewChange()
createReplica()
NettBankServer
45© 2002-2003 Hein Meling and Alberto Montresor
Summary
1. Introduction
2. Object Group Communication
3. The ARM framework
4. Integration with Java RMI / Jini
5. Conclusions
46© 2002-2003 Hein Meling and Alberto Montresor
Introduction to Jini
Jini is an API built on top of the Java 2 platform:
• enables spontaneous networks of devices/software services to assemble into federations of objects
• addresses the distribution problems in these federations through a set of simple interfaces and protocols
Jini
Network
47© 2002-2003 Hein Meling and Alberto Montresor
Jini Architecture
The components of the Jini architecture may be divided in three categories:
• Infrastructure i.e. the components that enables building a federated Jini system
• Model that “supports and encourages the production of reliable distributed services”
• Services that can be made part of a federated Jini system and which offer functionality to any other member of the federation
• Javaspaces
48© 2002-2003 Hein Meling and Alberto Montresor
Jini Infrastructure
The infrastructure is composed of:
• Java RMI protocol:enables objects residing in different JVMs to communicate through remote method invocations
Client Server
Stub
Server-sideRMI
Runtime
Network
JVM1 JVM2
method() return x
49© 2002-2003 Hein Meling and Alberto Montresor
Jini Infrastructure
The infrastructure is composed of:
• Lookup Service: defines how services may become part of a Jini system and clients retrieve services by their types and attributes.
Client
Lookup Service
Server
StubStub
Join Stub
Lookup
Stub
Invocation
Lookup.
Stub
Discovery
50© 2002-2003 Hein Meling and Alberto Montresor
The Jini Programming Model
The programming model is based on three distinct paradigms for distributed computing:
• Leases extend the Java programming model by adding the time to the notion of holding a reference to a resource
• Transactionsallow a set of operations on one or more remote participants to be grouped in such a way that either all succeed or all fail
• Eventsenable objects to register interest in changes of the abstract state of remote objects
51© 2002-2003 Hein Meling and Alberto Montresor
Jini and Fault Tolerance
Jini fault tolerance is based on leases and transactions
• leases enable the detection of service failures
• transactions provide consistency by guaranteeing “all-or-nothing” semantics
Unfortunately, no support for high-availability is present in Jini
• No support for replication
• Failure of transaction manager clients and participants must wait for the recovery of the manager before serving further requests
52© 2002-2003 Hein Meling and Alberto Montresor
Enhancing Jini with Fault-Tolerance
Extending Jini with the Object Group Paradigm:
• Infrastructure• Extending Java RMI for Group Method Invocations
• Extending the Lookup Service for dealing with Group Proxies
• Programming Model
1. Object Group Paradigm as alternative programming model
2. Integration between transactional and object group model
• Services• Replicated JavaSpaces
53© 2002-2003 Hein Meling and Alberto Montresor
Extending Java RMI
RMI group at Javasoft designed Java RMI in order to be extensible
• The RemoteRef interface enables programmers to write their own references to remote objects on the client-side
Unfortunately, RemoteRefs are not sufficient
• There is no possibility to modify the behavior of RMI on the server side
RemoteRef
Client Stub
Server-sideRMI
Runtime
Server
54© 2002-2003 Hein Meling and Alberto Montresor
The Jgroup Approach (Current Version)
ServerProxy
Server
ClientProxy
Client
Statically or dynamicallygenerated – implementsthe remote interface
Fixed stub for server proxy
RMI Stub
Server-sideRMI
Runtime
RMI
ServerProxy
Server
Methoddispatchers
Multicast
RMI Stub
Server-sideRMI
Runtime
55© 2002-2003 Hein Meling and Alberto Montresor
Designing a New Java RMI API
We have cooperated with Sun Microsystems to design a new RMI API:
• Fully customizable, on both the client-side and the server-side
• Based on Dynamic Proxy Classes (JDK 1.3)(No need for static stub generators)
• Two different versions:
• One-to-one (remote method invocations)
• Voted down in JSR-078
• Being included in the "Davis" release of Jini
• One-to-many (group method invocations)
56© 2002-2003 Hein Meling and Alberto Montresor
ServerProxy
Server
ClientProxy
Client
Statically or dynamicallygenerated – implementsthe remote interface
ServerProxy
Server
Methoddispatchers
Jgroup with 1-to-1 Customizable RMI
RMI Stub
Server-sideRMI
Runtime
RMIMulticast
RMI Stub
Server-sideRMI
Runtime
RMI
57© 2002-2003 Hein Meling and Alberto Montresor
Jgroup with 1-to-Many Customizable RMI
ServerProxy
Server
ClientProxy
Client
ServerProxy
Server
ServerProxy
Server
Customizableobjects
Multicast RMI
58© 2002-2003 Hein Meling and Alberto Montresor
Extending the Lookup Service
Jini enables the registration of customized proxies for services
• this feature can be used to register group proxies using any implementation of the lookup service
Group proxies, however, differ from standard proxies as their contents may be dynamic
• server registration server reference added to group proxy
• server removal, lease expired server reference removed from group proxy
We have developed an alternative implementation of the lookup specification capable to deal with group proxies
59© 2002-2003 Hein Meling and Alberto Montresor
The Jgroup Lookup Service
Client
Lookup Service
Server Server Server
StubStub
Lookup
Invocation
Join Stub Join Stu
b Join Stub
Stub
60© 2002-2003 Hein Meling and Alberto Montresor
Extending the Jini Programming Model
Jgroup + Jini programming model for fault-tolerance
• Leases + transactions
• Object group communication
Problem:
• transactions and group communication considered as separate aspects of fault-tolerance
• their composition does not result in any meaningful combination of their respective strengths
We need the possibility of using replication in transactions:
• Transaction managers
• Participants
• Clients
61© 2002-2003 Hein Meling and Alberto Montresor
Summary
1. Introduction
2. Object Group Communication
3. The ARM framework
4. Integration with Java RMI / Jini
5. Conclusions
62© 2002-2003 Hein Meling and Alberto Montresor
Applications (Research)
Jgroup/ARM is being used for
• A distributed auction system• Partitionable auctions
• [Panzieri, Amoroso et al., University of Bologna, 2002]
• An online-upgrade service for active replication• [Solarski, GMD Fokus]
• A replication management framework• Application-specific replication and recovery strategies
• [Meling, HiS]
• Dependable naming service• Support for extensible group proxies (JERI)
• [Meling et al., HiS]
63© 2002-2003 Hein Meling and Alberto Montresor
Applications (Education)
Jgroup is being used at the
• Stavanger University College in the “Advanced Programming” course
• University of Bologna in the “Distributed System” course
• Norwegian University of Science and Technology in the “Dependable Systems” course
Source for several projects and thesis:
• Low-level communication protocols (Bologna)
• Replication services (Bologna)
• Wide-area distributed services (Padova)
• Management and deployment issues (HiS)
64© 2002-2003 Hein Meling and Alberto Montresor
Thank You!
http://jgroup.sourceforge.net/