DRASTIC and GRUMPS: The Design and Implementation of...

DRASTIC and GRUMPS:

The Design and Implementation of

Two Run-Time Evolution Frameworks

Huw Evans

Department of Computing Science

The University of Glasgow

Glasgow, Scotland, G12 8RZ

Abstract

This paper describes two different approaches to supporting and managing the run-time evolution of distributed

applications. The evolution models and implementations of the DRASTIC and GRUMPS projects are presented and

contrasted. Within the context of related work, the paper argues that there is too little support for software engineers in

constructing distributed applications that may be evolved at run-time. DRASTIC and GRUMPS address this problem

by providing programming language layers that place the support for run-time change at the centre of an application’s

design. The paper goes on to present four core ideas that are generally applicable for the construction of run-time

support layers. In addition, the lessons learned from conducting this research are discussed. The core ideas and

the lessons learned are used to derive ten principles that may be exploited when designing other run-time evolution

support systems. The paper ends with a discussion of the open issues that face those working in the field of distributed

run-time evolution support.

1

1 Introduction

The distributed software systems in use today are highly complex and have become increasingly more important to

our everyday lives. These kinds of system, such as those used to manage telephone networks and hospitals, require

ongoing maintenance and upgrades to help ensure their continued availability. However, it is not viable to bring the

entire system down to perform a change, therefore, for this class of system, it must be possible to change them while

they are executing. In this paper, run-time evolution is defined to be the ability of a distributed application system

to support change to its functionality while it is executing. Unanticipated software evolution is defined to be the

ability to be able to support changes to an executing application that were not anticipated when the application was

either originally designed or first started. The work reported in this paper supports unanticipated software evolution

as post-design and post-execution application updates may be applied to a running system.

This paper describes the lessons learned from the design and implementation of two different evolution-support

software layers that were produced as part of the DRASTIC and GRUMPS projects. These different systems support

the unanticipated run-time evolution of distributed applications. The DRASTIC and GRUMPS projects both take the

view that to successfully design and implement an evolvable distributed system requires that evolution be considered

from the very start of the application’s design. This is achieved by expressing the support for run-time evolution in

terms of an evolution model. The evolution model provides a framework within which the design of an evolvable

application is performed. The model makes available an evolution architecture that defines how a distributed appli-

cation should be implemented in order to evolve it at run-time. The evolution architecture makes available the basic

programming units that may be changed at run-time as well as controlling how and when a software engineer may

apply an update to an executing system. The evolution models are made available as Java-based programming layers

which embody those parts of the models that are required at run-time.

The DRASTIC and GRUMPS systems take different approaches to the kinds of run-time evolution that are to be

supported. DRASTIC takes the view that evolution is performed relatively infrequently, but that when it does occur

it is a major event in the lifetime of the system. To support this, DRASTIC focuses on managing the application of

evolution and controlling its effects on the post-evolution system. By contrast, the GRUMPS system takes an almost

opposite view as evolution is seen as a relatively common operation that should be performed promptly and may occur

many times in quick succession. To support this, the focus in the GRUMPS approach is on separating what is being

evolved from how it is to be changed in order to retain as much run-time flexibility as possible.

The work on DRASTIC and GRUMPS are the first attempts at evolution models and evolution architectures for

2

large, long-lived, distributed systems. As such, both systems are proof of concept and provide an initial investigation

and exploration of the ideas. Performing a large empirical study of theuseof the two approaches was not the focus of

the work. The work has centred on the design and implementation of the two evolution models which have both been

evaluated by the author. The DRASTIC platform has been used by University graduate students in their project work

(see [11]) and GRUMPS by other members of the GRUMPS team [32].

The rest of this paper is organised as follows. Section 2 describes the related work to place this work into context.

Section 3 discusses the core ideas that have been learned during the DRASTIC and GRUMPS projects and section

4 describes the main aspects of the two run-time evolution models. Section 5 then details their respective run-time

architectures which is followed by a discussion of their implementations in section 6. Section 7 describes the con-

clusions of the DRASTIC and GRUMPS research work. Section 8 discusses the open issues for the field of run-time

evolution support. Lastly, section 9 summarises the paper.

2 Related Work

This section discusses alternative approaches to supporting run-time evolution to place the DRASTIC and GRUMPS

work into context. Support for the run-time evolution of systems can be divided into four broad categories: program-

ming language-level support, both research (section 2.1) and commercial (section 2.2); component-level, e.g., J2EE

and .NET (section 2.3); and architectural, such as Darwin and ArchStudio (section 2.4). Section 2.5 concludes this

section by arguing that there is a serious gap in programmer support which exists above the language and component

frameworks, but below the architectural approaches to run-time evolution.

2.1 Research-based Language-level Support

Research into programming language-level support for the run-time evolution of systems has been ongoing for

decades. Dynamically typed programming languages, such as Self [5] and Smalltalk [14], define powerful mech-

anisms that allow their programs to be evolved at run-time. In this class of languages a program can be written that

manipulates itself, and so these types of language provide rich evolution mechanisms [17].

Other language-level research, represented here by the work on Erlang [2], Argus [4] and the recent work of Hicks

on the use of dynamic updates using verifiable native code [16], demonstrates that there are generally applicable

approaches to supporting the run-time evolution of a program. Language-level support typically: defines the unit of

replacement (e.g., modules in Erlang); provides the necessary mechanisms to allow the units to be safely replaced at

3

run-time; and also provides support so that state may be transferred from its old format into its new format. Some

approaches also provide tools (e.g., Argus) so that the run-time evolution of a number of modules may be coordinated

by a human user.

The work discussed above provides useful mechanisms to allow the run-time evolution of a program to be per-

formed. If software systems are to be effectively evolved at run-time, the programming languages and environments

that they are constructed from must make available a basic set of primitives that may be used to express and manage

change. It is advantageous if these are expressed to the programmer as part of the programming model, as is the case

with CLOS (as well as Smalltalk and Self) which defines a powerful behavioural reflection mechanism [19]. Building

the basic change primitives into a language ensures that a programmer can exploit them to construct an evolution

model. The provision of such primitives is a first step towards the design and implementation ofstandardevolu-

tion models and architectures that may be expressed within a particular language. Multiple standards are required

to address the differences that exist within the various programming domains, for example, real time programming,

embedded programming and general-purpose programming. No one approach will fit all these programming models.

However, without a standard approach, programmers are forced to continually reinvent evolution mechanisms,

models and strategies. Such an approach leads to a number of separately developed and incompatible models. Due

to its ubiquity, a standard model is much more powerful, its universal presence may be usefully exploited byall

programmers.

2.2 Commercial Programming Languages

Modern statically-typed commercial programming languages such as Java and C# [24] encourage programmers,

through various concepts such as typing, encapsulation and polymorphism, to write code that should be easier to

maintain and evolve. However, focus is placed on the non run-time issues of reusing program source code and trying

to make it easier to manipulate the codebase of a particular application. Neither of these languages directly address

the issues of run-time evolution by defining an evolution model or architecture.

It is possible to dynamically update both Java and C# programs. In Java, this can be done either by using an

application specific classloader, or by using an API that supports the run-time manipulation of Java bytecode to gain

certain Smalltalk-like meta-programming effects. However, these are ad-hoc approaches that lie outside a standardised

evolution model and run-time architecture.

Statically typed languages tend to have a boundary in the design of the language that firmly separates the compile-

4

time decisions from the run-time decisions, and it is typical that if a decision has been made at compile-time it cannot

be changed later. The designers of such languages have not anticipated the need of programmers to be able to build

systems that can be flexibly manipulated at run-time. However, there is a need for this, and the Java virtual machine

is moving towards support for this, e.g., through the introduction of the Java Platform Debugger Architecture and the

HotSwap functionality [26] available in the Java HotSpot VM, starting from JDK 1.4.

2.3 Component Frameworks

Component frameworks such as J2EE [18], .NET [25] and CORBA [29] support the software engineer in the produc-

tion of enterprise-scale architectures. J2EE and .NET are primarily focussed on the complex problem of providing

large-scale systems that can interoperate with legacy systems and with others that are available across the Internet,

via the Web Services suite of standards [33]. Such architectures are useful because they do support some run-time

evolutionary operations, e.g., components may be replaced during application execution. The use of such architectures

is also advantageous as they can make the production of complex software easier by encouraging a certain style of

programming that is likely to lead to smoother component integration and replacement. However, these component

models do not define a general evolution model and run-time evolution is not considered to be a design priority, so the

problems that apply to programming languages also apply here.

However, component frameworks offer a natural place in the software hierarchy within which to add support for

run-time evolution, preferably building on standard language-level support. The kinds of large, long-lived systems

constructed using J2EE and .NET are likely to benefit from run-time evolution. Such an approach would provide

the industry with a standard for enterprise-level run-time change. By being constructed on top of a set of evolution

standards, separately developed corporate-critical systems could be more successfully integrated with one another as

an evolution in one would be less likely to adversely affect another. Such a standard would also benefit the Web

Services initiative as this community is defining standards for how such enterprise-level systems can inter-operate

across the Web.

2.4 Architectural-Level Support

At a higher-level than component frameworks, some research systems allow software designers to design and ma-

nipulate their systems in terms of its architecture. Systems such as Darwin [23] and ArchStudio [28] abstract away

from language-level and component-level details, preferring to encourage software engineers to think in terms of the

5

major architectural abstractions that their systems are composed from. The architecture of these kinds of system is

typically constructed from a collection of components that are connected together with a number of (possibly typed)

connectors, and data is moved from one component to another, via the connectors. Each component processes the

data in a particular way, before forwarding it to another connector or, for example, writing it to stable storage. To

assist the designer of such a system an architectural description language, together with tool support, is provided that

may be used to describe and perform an update to the topology and functionality of a system at run-time. In these

kinds of system, run-time evolution is usually expressed at the level of the component and connector, e.g., start/stop a

component and connect/disconnect a connector.

ArchStudio also defines a set of rules that govern, among other things, how components may communicate with

one another. Each component has a top and bottom domain: the top domain specifies the set of notifications to which

a component responds, and the set of requests it emits up an architecture; and the bottom domain specifies the set

of notifications that a component emits down an architecture and the set of requests to which it responds. This style

depends on a principle that ensures a component within the architecture is only aware of components above it, i.e., a

component is unaware of the components that are beneath it.

These types of system are a move towards a domain-specific evolution model and architecture in that systems for

which the component and connector architecture is appropriate may be evolved. The work reported in this paper is

conceptually below that of an architecture-oriented approach, but above that of a component framework. An archi-

tectural approach, such as ArchStudio, could make use of the kinds of run-time evolution support provided by the

DRASTIC and GRUMPS systems. In turn, either of these two systems could make use of the underlying evolutionary

facilities that were provided by a component framework that is itself making use of the more primitive language-level

support.

2.5 Discussion

Research results and commercially available solutions to the problem of run-time evolution tend to be focussed at

either the relatively low level of the language or component framework, or at the high architectural level. Dividing

the focus into two groups in this way has led to improvements in support for run-time evolution as the issues may be

separated from one another within the layers. The language and component frameworks provide the mechanisms to

be able to evolve the contents of (typically) a single address space. The architectural approaches, such as Darwin and

Archstudio, focus more on the design and run-time management of a component and connector-oriented distributed

6

system. At this higher level, evolution is performed by replacing individual components and by altering the topology

of the system.

As these two approaches have focussed on the issues that are largely within their own layers, a gap between the

two has emerged. This paper argues that in bridging this gap the advantages of both approaches to run-time evolution

may be exploited. One way to provide such a bridge is to define an evolution model and evolution architecture. At

the lower-level, the model and its architecture exploit and abstract over the evolution mechanisms that are provided

by the underlying programming language and component framework. To provide the bridge to the higher-level, the

evolution architecture makes available an API that embodies the semantics of the evolution model. This API may then

be used to construct a distributed system that can be evolved at run-time, as well as to provide tools to support this.

The evolution model supports the software engineer indesigninga distributed system that may be evolved. The

evolution architecture provides support for the software engineer inperforminga run-time change. In turn, the evo-

lution model and architecture are brought together by the evolution API which supports the software engineer in

constructingan evolvable system.

Systems such as Darwin and ArchStudio have, inevitably, addressed some of these issues. However, the approach

advocated here suggests that making these facilities available as part of an explicit evolution model that is available at

run-time will lead to distributed systems that have been designed and implemented with their evolution in mind from

the beginning. In the same way that software engineers have to consider the failure model of a distributed system, with

the approach described here, they are also required to consider the evolution model as well. The software engineer

is made aware of the need to program for evolution and by using the evolution architecture they build in support for

the run-time evolution of their system, making it an explicit part of its design and implementation. The evolution

architecture also takes away from the software engineer some of the complexity of performing and managing the

evolution at run-time. Such an evolution model and implementation is made available in the DRASTIC and GRUMPS

systems.

3 Core Ideas

This section describes the four core ideas that are the main results of the work summarised in this paper. The main

idea is the need for an evolution model (section 3.1). The requirement to separate what is being evolved from how it

is evolved is discussed in section 3.2. Section 3.3 then motivates why programmers should program for evolution, and

section 3.4 explains the reasons behind wanting to make as many decisions as possible at run-time.

7

The ideas introduced in this section are summarised in place, turning them into ten principles for supporting

run-time evolution. These principles are returned to throughout the paper to reinforce the contribution that has been

made.

3.1 The Need for Evolution Models and Evolution Architectures

An evolution model defines for a programmer that aspect of the programming model which specifies how a software

system may be designed and implemented for run-time change, as well as how the change may be performed and

supported during system execution. An evolution model can be divided into two separate, but equally important, parts:

first, is the evolution model that focuses on the design of the distributed system and the implications of changing it at

run-time; the second is the evolution architecture that, together with the evolution API, handles the details of how a

run-time evolution is performed1.

A typed programming model ensures the programmer does not try to perform an operation that is deemed unwise

or unsafe by those that have defined the semantics of the type system. Within the context of run-time evolution, a type

system can be seen as a means of ensuring call-level compatibility between separately defined and developed parts of

a system. Typing provides a foundation on top of which an evolution framework can be built.

The design- and implementation-time evolution model should support the software engineer in understanding

those aspects of their problem that require run-time change. Among other things, an evolution model should define

what can and cannot be evolved, what the unit of evolution is, and what the main programming abstractions are that

a programmer may use to build an evolvable system. By following this approach, the software engineer builds an

application system that contains support for its run-time change. The application not only contains the application

system, but also the evolution system, which is capable of evolving the application system at run-time. This brings us

to the first principle of support for run-time evolution.

An evolution model is crucial if run-time evolution is to be effectively used and performed at run-time.

Principle 1: Evolution Models are Crucial.

Without such a model, it is possible that a design or implementation decision that may have an impact on how the

system could be updated at run-time may not be considered by the software engineer. An evolution model helps the

software engineer to consider other system issues within the context and constraints of run-time evolution.

1The phrase “evolution model” is used in this paper to refer to the model, the architecture and the API.

8

In addition to the above, no one model is suitable in all circumstances. There are many different types of en-

vironment that programmed systems are deployed into, such as embedded systems and general-purpose business

applications. No one standard evolution model and evolution architecture is suitable across such a range of systems

that have radically different requirements. Rather, a handful of standards will be required, where each standard ad-

dresses the issues pertinent to run-time evolution within the particular environment. As examples of this focus on

differing environments, the DRASTIC and GRUMPS systems address the issues of run-time evolution within their

respective types of distributed system.

An evolution model needs to be tailored to the environment in which it is being used.

Principle 2: A Number of Evolution Models are Required.

3.1.1 The Evolution Architecture

The run-time realisation of the evolution model is its evolution architecture and it must support the programmer in

effectively updating an executing application. This means providing support to manage and safely update the running

system, helping it to correctly move between evolution states without generating errors at the application-level. The

evolution architecture may also provide some support to automate certain operations to remove the need for human

input.

As the evolution architecture is responsible for effecting the changes to the executing program, the choice of

implementation language is important. When designing and implementing an evolution architecture a free choice of

implementation language may not be possible; one may be imposed as other parts of the system may already use

that language. The implementation language will have a big impact on how an evolution may be performed. This is

because certain languages are more flexible in allowing changes to be performed to the components the program is

constructed from, e.g., more readily allowing for the replacement of code or the run-time reinterpretation of it.

The evolution model should reflect and be compatible with the other models used within the system, both the

computational models, such as that for distributed computation, and the non-functional models, such as the business

model that the computer system helps to support. Evolving an executing system will affect some aspects of it, such

as its availability, which in turn may affect the business. Therefore, given this level of importance, and the desire to

be able to change the application system at run-time, the evolution model should become a central part of the overall

design and implementation of that application.

9

A software engineer should consider run-time evolution to be an

integral part of an application’s design.

Principle 3: Support for Run-time Evolution is a Central Issue.

Changing the behaviour of a program at run-time involves altering its semantics. In the general case, only a

human being has the necessary information and knowledge to know what the implications of a particular change may

be. Therefore, it is not possible tofully automate the run-time change of a system. Even though full automation is

not possible, it is possible to support the human in their design and implementation of the system and its subsequent

run-time change. Such support should make the task of changing a system at run-time easier and thus less error-prone.

The run-time evolution of a program is a semantic issue

which, in the general case, cannot be fully automated.

Principle 4: Run-time Evolution is a Semantic Issue.

In designing the evolution model and its architecture it can be advantageous to reuse the concepts that are defined

by the underlying programming language and component frameworks, and make them available at the evolution model

and architectural level. This increases the software engineer’s familiarity with the model and its implementation and

decreases the number of concepts they have to simultaneously manage and tradeoff against each other. This reduces

the complexity of the evolution model which is a good general design principle.

The design of the evolution model should be sympathetic to the

underlying component and programming models.

Principle 5: Exploit Design Familiarity.

3.2 Separating What is Evolved from How it is to be Evolved

When designing and implementing a system to support run-time evolution it is important to separatewhat is being

evolved fromhow that evolution will be performed. What is being evolved is typically an executing system. It is

possible to evolve such a system in one of a number of ways. By separating these two issues, the designers and users

of the framework are given more flexibility in choosing the most appropriate approach to the evolution of their system.

The use of separation leads to an evolution model where change isappliedto a system, and to an approach where a

10

single evolution model may be applied to the run-time update of a number of different application systems. Mixing

them leads to a single, monolithic system that is harder to manipulate as the two parts become intertwined.

An evolution model can help the software engineer separate what is being evolved

from how it should be evolved.

Principle 6: Separation of Concerns is Important.

By separating the system being evolved from the system that will help perform the change, and by encapsulating

both of these into different models, those updating the system are given support in handling the complexity of the

change. The separation of concerns increases the ease with which the individual pieces may be inspected and under-

stood by those responsible for its change. This is because one element of the system is not unnecessarily confused

with another. In addition, not only is it possible to change the running application system by applying the evolution

system to it, but it is also possible to change the evolution systemitself. As the evolution system is just another system,

it is possible to define a meta-evolution system that is capable of changing the evolution system. In the same way that

support is required to be able to change running application systems, support to be able to change the evolution system

is also necessary. If the running application system is changed, the evolution system that was used to perform that

change may also need to be adapted to ensure the newly changed application system can be changed in the most ap-

propriate manner in the future. Therefore, there is a feedback cycle between the application system and the evolution

system. A change in one leads to the need to change the other, which repeats itself.

The designer of a system cannot foresee all future uses of it and so an update should be applied in a way that

is most appropriate to the current as well as future use, state and architecture of the system. This applies equally

to the application system as well as the evolution system. A change performed to the application system may also

have an effect on a subsequent use of the run-time evolution system, and vice versa. This is because the application

system and the evolution system are both present at run-time and, even though the two systems are separate, they are

inevitably inter-related as they must interact in order to manage an update. A change to one system may change the

other. Therefore, it is advantageous if the evolution model is flexible enough to support evolved uses of itself over

time, in order to help support change to the on-going evolution of the application system.

It is advantageous if the evolution model and architecture can support evolution of itself.

Principle 7: The Evolution Model should be Evolvable.

11

3.3 Programming for Evolution

Application programmers must program for evolution. The approach to evolution advocated here involves an evo-

lutionary process being applied to an executing system. This implies that normal application computation could be

interrupted during the evolution process. The executing application needs to be able to tolerate changes being per-

formed to it. To support this the run-time evolution architecture should present an API to the programmer so that a

meaningful dialogue between the application program and the evolution support system is possible.

When implementing systems, programmers make assumptions about the environment they are dealing with in

order to simplify their solution. Evolving a system at run-time can cause such assumptions to break which can result

in the failure of a previously working application. As the programmer is working within the context of an evolution

model and evolution architecture, they must be aware that their program’s behaviour may change due to an evolution.

In the same way that distributed programs can fail in ways that are semantically different to purely centralized solu-

tions, programs executing within the context of an evolution model can fail in ways that are semantically different to

programs that are executing outside such a context. As a result of this, programmers must be aware that a system can

behave differently due to the evolutionary context it is executing within and should, therefore, dedicate part of their

program to handling the effects of its possible evolution.

Those that use evolution models and their APIs need to program for evolution.

Principle 8: Program for Evolution.

In the general case, it is not possible, nor is it always advisable, to hide from the programmer or the executing

system that an evolution is taking place. Hiding this fact is generally not possible because the system being changed

is the same as the executing system and so it may be able to detect a change to itself. In addition, trying to hide such

a change may be bad for the long term maintenance of the program. Trying to hide the change may require a solution

that is artificial with respect to the rest of the system’s design. In the future, a new change may be required, however,

it may not be easy to apply the new change due to the artificiality of the previous change. The software engineer

could be forced to perform the new change within a less than ideal context, one that has been brought about due to a

misguided (in the long term) desire to encapsulate a change from the rest of the system.

As such kinds of change cannot be hidden from the programmer, the rest of the change should be expressed to

them through the run-time evolution architecture. In this way, the software engineer is assisted in performing the

change by giving them access to the current run-time state of the system.

12

As much design-time information as possible should be retained

and made available to the run-time system.

Principle 9: Information Loss should be Minimised.

The programmer should be able to ascertain the current state of the run-time architecture of both the application

system and the evolution architecture. If no information is lost when moving from the design stage through to the

execution of the system, then the executing applicationis its design and the executing evolution architectureis the

evolution model. Having access to this kind of information at run-time can be invaluable when performing a run-time

evolution. A proposed application evolution may break a design invariant, such as no cycles being allowed between

run-time components. Having this detailed information available, at several levels of abstractions, at run-time can

ensure the system may be tested and the evolution architecture can alert those performing the evolution that there may

be problems with certain kinds of change.

3.4 Delaying Decisions until Run-time

The run-time evolution of a system requires as many decisions about that evolution as are appropriate to be performed

at run-time. This is necessary for two reasons. Firstly, certain information relevant to the evolution may only be

available at run-time. For example, it may not be possible to evolve a system when certain important pieces of code

are executing. Being able to either detect when such code is executing or will be executing in the future will allow an

evolution to be scheduled appropriately. Secondly, the most accurate description of a running system is the running

system itself. It would be possible to retain a static view of the current system and update it after an evolution has been

performed. However, it is possible that the static description will become out of date with respect to the executing

system. Rather than do this, the evolution architecture can track the evolutions that have been applied to the running

system by inspecting the system state that is pertinent to evolution before and after the change has been applied. To be

able to do this, both the application and evolution architecture need to be supported in making decisions at run-time.

In addition, run-time evolution requires the evolution architecture to be very flexible. If a human user needs to

update the functionality of an executing system, the flexibility required to do this demands that as many decisions as

appropriate can be made and performed at run-time. Information from the run-time system needs to be accessed at

run-time so that the most appropriate decision can be made. However, this level of access can have consequences for

the enforcement of decisions that have been made before system run-time (section 7).

13

Run-time evolution demands that as many decisions as are

appropriate can be made at run-time.

Principle 10: Run-time Evolution requires Run-time Decisions to be Supported.

4 Experiments

The results presented in this paper are based on the outcome of two major experiments conducted as part of the

author’s PhD work in the area of support for the run-time evolution of distributed systems [7]. There are many

different kinds of distributed system possible and many different approaches to the support of run-time evolution

within such an environment. The DRASTIC and GRUMPS experiments are two data points within the space of

all possible approaches to the run-time evolution of distributed systems. This work focuses on supporting run-time

evolution when the application system has been designed and built with this in mind. The equally important problem

of adding support for run-time change to a pre-existing application falls outside the scope of the DRASTIC and

GRUMPS work and so is not discussed in this paper.

Sections 4.1 and 4.2 introduce the DRASTIC and GRUMPS evolution models, before discussing in section 4.3 the

kinds of evolution each approach supports, how they make use of the four core ideas introduced above and the main

differences between the two approaches.

4.1 DRASTIC

The DRASTIC project [9, 10, 11] investigated support for the run-time evolution of a large, long-lived, persistent

system. The kind of distributed application system being considered would be, for example, one used to manage the

day-to-day business of a medium sized company that may have a purchasing department and an accounts department

that would need to interact. The software managing this company is considered too important to be brought down in its

entirety, however, it has been decided that identified parts of the software can be made unavailable for periods of time.

To support this, the DRASTIC evolution model and application architecture divide the distributed application across a

number of disjoint, semi-autonomous regions called zones. The organisation of the zones follows the organisation of

the business, thus the DRASTIC project assumes there would be a zone for each of the above departments. By doing

this, the DRASTIC evolution model is tailored to the environment in which it is to be used (see principle 2).

A zone contains the software necessary to support that part of the business’ functionality. A zone is a collection of

executing processes and typed object-oriented language-level objects that may communicate with other objects that

14

are contained either within the same zone or within other zones. The DRASTIC communication model, therefore,

supports inter-zone method invocations.

To evolve the contents of a zone, a language type would be updated, perhaps changing its public interface, and

possibly adding or removing functionality from it. During a zone evolution, as a simplifying assumption, all instances

of a particular type are updated.

Clemens Szyperski in [31] defines a component as a set of normally simultaneously deployed atomic components

where an atomic component is a code module and a set of non-compiler-generated resources, e.g., configuration files.

A DRASTIC zone would contain many such Szyperski components as well as the inter-zone contracts, executing

change absorbers and type transformers, and the tools to manipulate the zone at run-time. Therefore, a DRASTIC

zone is a much larger, more complex and more dynamic entity than a Szyperski component. In this sense, a DRASTIC

zone has more in common with the Megaprogramming modules of Wiederhold, Wegner and Ceri, as defined in [34].

Zones are only semi-autonomous as their code will interact with code in other zones to perform the functionality

of the system. Interaction between objects in different zones is controlled on a pair-wise basis by a zone contract.

A single zone may hold several contracts with a number of other zones. A contract defines the types of objects that

may be exchanged between the two zones and how inter-zone method invocations should be handled. Code in one

zone may be expressed in terms of a pre-evolution type, however, the call may actually be handled by an instance of

a post-evolution type. Programmer supplied code is then installed at the zone boundary inside objects called change

absorbers. A change absorber translates the inter-zone call between the two differently evolved objects. An evolution

of a single zone involves temporarily suspending activity between the zone being evolved and the other zones that it

holds contracts with. Therefore, a DRASTIC-based distributed application has to be designed and implemented by the

software engineer to tolerate part of that application not being present when that part is evolved. The processes within

that zone are terminated and any persistent state is updated to make use of the evolved programming types. Processes

within the zone are then restarted and inter-zone method invocations are redirected by the DRASTIC run-time system

via the newly installed change absorbers.

Before the evolution has been performed, code in an external zone may be successfully calling code within the

zone to be evolved. After the evolution has taken place, the call may no longer be successful. This is because a change

to the pair-wise contract may have made that call invalid. The software engineer that wrote the calling code needs to

be aware that this kind of failure is possible so that they can program their system to handle this possibility. This is an

example application of principle 8, the need to program for evolution. As run-time evolution is a semantic issue which

15

cannot be fully automated in the general case (principle 4), the software engineer is forced to program for evolution.

Principle 8 is, therefore, a consequence of the existence of principle 4.

Zone, contracts and change absorbers are concepts that exist at system design-time that become application-level

run-time objects. When a software engineer interacts with a DRASTIC-based application they are given access to

objects that represent these design-time abstractions. In this way, DRASTIC supports principles 5 and 9. The concepts

at both design- and run-time are the same, reducing the complexity of the system, and the design-time information

about these concepts is made available during system execution via their run-time objects. This allows the run-time

system to inspect design-time information which may be used to check whether certain evolutionary changes should

be performed on the system. Thus, the DRASTIC system embodies principle 4, by supporting a human being in

changing a run-time system.

By making available run-time objects that model their design-time counterparts, the DRASTIC software engineer

is given support in considering run-time evolution to be an integral part of an application’s design (principle 3). The

design of an application is partly composed from the concepts that are used to model its change and to also perform

that change during application execution.

Figure 1 shows two zones (zonePurchasing and zoneAccounts) with one process in each zone. Process 2

in zoneAccounts contains an object calledm2 that is of typeM. This object is referred to from process 1 in zone

Purchasing. However, process 1 believes it holds a reference to an object calledn of typeN. At some point in the

past, process 1 did hold such a reference to an object of typeN in zoneAccounts. However, zoneAccounts has

subsequently evolved itsN type into typeM. In order to encapsulate the effects of this change within zoneAccounts,

a change absorber has been made available on the inter-zone reference. This change absorber exports theN type so

that process 1 may invoke along the chain. The invocation is translated into one on the evolved typeM and the method

call is directed through the contract-containing process in zoneAccounts.

The design- and run-time manipulation of the kind of system described above is a complex endeavour. The

DRASTIC evolution model handles some of this complexity for the software engineer by embodying a particular

approach to run-time change within its model and run-time architecture. Certain decisions on how to perform a run-

time change have been made by the provider of the evolution model and so the software engineer does not have to

be concerned with these. In this way, the job of the software engineer is made easier, so that they may focus on their

application, not on the details of how to change its semantics at run-time. DRASTIC, therefore, supports principles

1 and 6: an evolution model has been provided to support run-time application evolution; and a clear separation has

2It is calledm in the sense that this is the name of the reference in the Java source code.

16

Zone Purchasing Zone Accounts

M

User Process 2

m:M

User Process 1

n:NContract Contract

changeabsorber

N

Figure 1: An Inter-zone method Invocation via the Zone Contract

been made between what is being evolved and how it is to be evolved.

The DRASTIC evolution model and architecture supports a weak form of principle 7 in that an evolution may be

modified by a subsequent one by deploying new change absorbers. The formulation of this principle and its advantages

only became apparent when designing the GRUMPS approach to run-time evolution.

4.2 GRUMPS

The GRUMPS project [3, 6, 8, 12, 15] is developing techniques and software to collect and manage large collections

of user actions from distributed investigations. An investigator typically has a hypothesis they wish to investigate

and, together with a software engineer, a distributed system would be deployed to test that hypothesis. During the

collection of the user actions, new questions may come to light that the investigator wants to explore by changing the

implementation of the currently executing investigation. To support this, the GRUMPS program architecture allows

the functionality of a system to be changed at run-time.

The GRUMPS approach to constructing a distributed investigation divides the application code across a number

of distributed, inter-connected GRUMPS Unit (GU) objects. GU objects communicate with one another using uni-

directional event channels. GRUMPS events are sent through event channels and the GRUMPS event encapsulates

application data and executable code. A GU object contains an event processing object (EPO) which receives incoming

events, processing them in some way before (optionally) sending them to another GU, or possibly long-term storage.

Evolution is performed in GRUMPS by replacing the EPO objects inside GU objects. As an EPO object may be

replaced, the way an event at a particular GU object is processed may be changed. GU objects are placed inside

GUContainer objects.

A GUContainer object acts as a place-holder for a number of related GU objects, giving them a well-known

location and allowing the programmer to manipulate them as a single entity. GUContainers make available a special

17

event channel called a control event channel. A control event is a special kind of GRUMPS event that carries code

and a payload with it. A control event is passed to a (typically remote) GUContainer and the GRUMPS run-time

system passes a reference to the container to the control event. Code in the control event is then executed which will

call the public methods on the GUContainer. This mechanism is used to install GU objects inside a GUContainer and

to update EPO objects within a GU. To replace an EPO within a GU, a control event is sent to the GU’s containing

GUContainer. The control event code then retrieves from the container a reference to the GU object of interest. This

GU then has its current EPO removed and the new EPO installed3.

GRUMPS processes form a spanning-tree and the interaction of a process with the tree is managed for the pro-

grammer by the Teaq4 subsystem. Objects (typically GUContainer and GU objects) are located within the distributed

system by sending OQL-like queries across the tree [6, 8]. Within a single process, objects that are to be found by

incoming queries are registered with the GRUMPS run-time system. A newly received query will be run against the

current collection of registered objects. If a match is found, either a copy of the matching object is returned to the

query initiator, or a proxy object is created and that is returned to the query originator from where it may interact with

the (remote) matching object. If a query matches a particular GUContainer, the result object that is passed back is a

reference to the control channel which may be used to send control events to the container. The approach to program-

ming with queries is referred to as query-oriented programming (QOP5) [12] which is one component of Teaq. The

distributed group of objects is collectively known as a GRUMPSNet.

Figure 2 shows a simple GRUMPSNet that consists of two Teaq processes with a single GUContainer each, the

left container holds two GU objects and the right container holds a single GU. Data is collected from two instrumented

computers (see section 5.2) which is sent to the left-most GU objects. These objects translate the data into GRUMPS

events and these events are sent to the third GU which cleans the data before writing it to stable storage. The two

GUContainers each have a control channel which may be used to update the contents of the container. A GUContainer

is discovered at run-time by attaching a process to the GRUMPSNet and propagating a QOP query across the process

graph. The result of such a query is typically a reference to the discovered GUContainer’s control channel.

A GRUMPSNet can be seen as an instance of the pipe and filter design pattern where the filtering is performed by

the GU objects. The design of GRUMPS deliberately exposes the complexity of the GU objects to the programmer,

3GUContainers and GU objects may also be manipulated, e.g., removing a GUContainer from a process or a GU object from its container.4The Teaq subsystem manages the processes running within the GRUMPS distributed system, placing them into the spanning-tree and recon-

necting them should their parent process crash or be taken away. Teaq is an acronym that stands for trees, evolution and queries and is pronounced

the same as ‘teak’.5c.f. Jxta search [30].

18

User 1

cleaneduser data

User 2

Key

Inter−GU reference

Teaq parent reference

ThreadEvent queue

ChannelControl

Channel

GEGE

GE

Control

EPO

GE Grumps EventGUContainer GU

Teaq Root Process

Figure 2: An Example GRUMPSNet

rather than abstracting over them as is done, for example, by [1], although such an abstraction could be provided

on top of the current GRUMPS API. The focus in the GRUMPS work is on the kinds ofprimitive support that are

necessary to support rapid run-time evolution. The approach in the GRUMPS work to providing rapid update is to

give the software engineer access to the primitive run-time evolution support mechanisms. By taking away the layers

of abstraction that are present in other evolution support mechanisms, such as DRASTIC, the software engineer can

affect an evolution upon a running system more rapidly. The rapidity of the update refers to the time it takes to affect

the running system, not the amount of time it takes to plan and to test the change to the system.

4.3 Discussion of Approaches to Evolution

4.3.1 DRASTIC

Evolution in DRASTIC is a planned, complex and heavily coordinated activity that is supported by a team of soft-

ware engineers who are responsible for the changes to the zones involved. An evolution will make a portion of the

application unavailable for a period of time, and ideally that part of the application will be contained within one zone.

Therefore, evolution in a DRASTIC-based system is a relatively rare event, perhaps only being performed once or

twice a year. Most of the time, a DRASTIC-based system will provide its application functionality, only being inter-

rupted for scheduled evolutions. Therefore, the DRASTIC evolution model can be seen as phased, long periods of

application activity are interrupted by major phases of application evolution.

In the DRASTIC evolution model, code change is encapsulated inside a zone. Together with that zone’s contracts

and its change absorbers, code elsewhere in the distributed system is not aware that a change has been performed.

19

This is how DRASTIC separates what is being evolved (the partial system contained in a zone) from how it is evolved,

i.e., the organisation of the system via zones, contracts and change absorbers. However, a DRASTIC programmer

must still program for evolution. After an evolution inside a zone, a method invocation into that zone may fail. This

is because the contract between the two zones may have been updated and a type previously used in the method

invocation may no longer be allowed to cross the zone boundary. An exception will be raised at the calling side and

so the programmer that has written cross-boundary calls must be aware that this is a possibility and be prepared to

handle this case. The DRASTIC evolution architecture allows the programmer to make some decisions at run-time.

For example, the programmer-supplied code that performs the inter-zone object translations contained in the change

absorber will be executed as part of the post-evolution execution of the application. This code can make run-time

decisions to control how the inter-zone call is translated.

At times, it may not be possible to completely encapsulate a change within a single zone. The change proposed to

the system may be sufficiently wide-ranging that performing an update to more than one zone is the most appropriate

course of action. In this case, the software engineers responsible for the affected zones would have to mutually

agree what system-wide changes were to be performed. This may require a change to a number of contracts and

possibly some change absorbers to allow any modified inter-zone interactions to take place. The DRASTIC approach

to run-time evolution acknowledges that such an evolution is necessary and the model and implementation support it.

However, DRASTIC assumes that it is a relatively rare event. For more discussion on this issue, see [7].

4.3.2 GRUMPS

In comparison to DRASTIC, run-time evolution in GRUMPS is a more ad-hoc and rapidly occurring activity. A user

of a GRUMPS-based system may turn-around an evolution of their distributed application quite promptly. A query

is issued to find a remote GUContainer of interest. The returned control channel is then used to send a control event

to the remote GUContainer which would replace a particular GU’s EPO object by executing the event code in the

context of the remote container. The time it takes to perform these operations is much shorter than the time it takes

to carry out the evolution of a DRASTIC zone. A GRUMPS evolution is a light-weight activity, typically replacing

a single object within a process. In this way, there is no attempt to hide the effects of an evolution from other parts

of the system and there is no need to impose the level of planning and coordination that are required in DRASTIC.

Evolution in GRUMPS is a relatively common event that would be performed several times a day. As a result, there is

no notion of evolution phases in the GRUMPS evolution model; multiple parts of a GRUMPS distributed system may

20

be undergoing some form of change. However, in both systems, it is hard to see how such systems could be built and

subsequently evolved without the application of an evolution model (principle 1). In addition, the differences between

the two models highlight the need for evolution models that are tailored to the specific environment the application

will be executing in (principle 2).

The GRUMPS evolution model divides what is being evolved (the GUContainers, GU and EPO objects) from how

it is evolved via the application of control events (supporting principle 6). An executing GRUMPS-based, distributed

application is evolved by applying an open-ended collection of control events to it. This increases the flexibility of

the system: objects that were only designed and implemented after the system has been started can be installed at

run-time and given a location within the system inside a GUContainer; and new control events can be sent into the

system to update it in a way most appropriate to its current contents. The running GRUMPS system can accommodate

new control events which themselves can embody completely new decisions in their code. This allows the GRUMPS

programmer to make as many decisions as are appropriate at run-time (principle 10).

In this way, the GRUMPS evolution model itself may be evolved, adapting to the ever changing needs of the

distributed system (principle 7). Programmers must program for evolution in this model (principle 8) as an evolution

may remove a part of the system. For example, due to a run-time change, a GU object may be removed and the

object that was previously successfully sending it GRUMPS events will have to find a replacement GU object. This

is done by executing a query to find another compatible GU. GRUMPS supports the programmer in making as many

decisions as are appropriate at run-time by providing control events and the query approach to programming. A control

event can contain the code necessary to decide whether it should apply itself to a particular GUContainer. Similarly,

a programmer may write a query that is based on information only available at run-time. The GRUMPS evolution

architecture makes available a collection of default control events to automate common activities in the system, such

as the installation and removal of GU objects. This reduces the number of run-time decisions that the programmer has

to make while simultaneously providing them with the necessary power to change the system in application-specific

ways by allowing them to author their own control events.

4.3.3 Both Systems

Within DRASTIC and GRUMPS systems, at times, it may be most convenient to shutdown part of the system to

perform an evolution to it. If a new version of the underlying evolution support architecture was to be deployed, for

example, it could make sense to terminate the entire system. Neither the DRASTIC nor GRUMPS approaches to

21

run-time evolution address the hard problem of being able to evolve the implementation of the evolution support layer.

This work addresses the problem of how to evolve the system that makes use of this layer. To address it, some form

of version control may be required. This would allow the distributed evolution support layer to track the version of

the various parts of itself that were being used at any one time. Such an approach is similar to that used in database

schema versioning [27].

It is assumed in both the DRASTIC and GRUMPS systems that there is a single system to be evolved and those

involved in the change understand and agree how the change will be performed. The DRASTIC and GRUMPS

systems are not targeted to the form of evolution where there are a number of complete systems which their users

want to update in several different ways, using third party components that are sourced from a number of different

vendors. The DRASTIC and GRUMPS approaches assume what is being changed has been written for, and tested

within, the single evolution framework in which it will be deployed.

5 Run-time Architecture

This section describes the DRASTIC and GRUMPS run-time support architectures and how they are used to perform

an evolution within the context of the evolution models described above. For detailed examples of how the DRASTIC

architecture is used, see [9, 11], and for GRUMPS, see [8, 12].

5.1 DRASTIC Run-Time Architecture

The DRASTIC run-time architecture implements the zones, contracts and change absorbers introduced in section 4.1.

These concepts exist at design-time and they become part of the run-time evolution infrastructure available to the

DRASTIC programmer (supporting principle 9). Figure 3 shows a snap-shot of a post-evolution executing application

that has been divided across three zones (Purchasing, Personnel andAccounts).

The Registry, EvolverMgr and thePASManager are processes that are used by the software engineer when

they want to evolve the application. TheRegistry is the lowest-level name server used in the DRASTIC system

and it is used by all the other non-application-level processes to contact each other. The description for all of the

zones is contained in the persistent application system manager (PASManager). This process contains information

on the application’s contracts as well as references into technology within each zone called the zone specific process

manager (ZspmDaemon) which manages the processes within a single zone6. TheEvolverMgr process is used by

6Only oneZspmDaemon in zoneAccounts is shown on the diagram to keep it clear. In a real system, each zone would contain such a daemon.

22

Personnel Zone

PurchasingZone

Accounts Zone

tn

xt xm

User Process 2

ZBP

ZBP

ZBP m:MZBP

EvolverMgrRegistry

ZBP

ZBP

n:N

ZSPMDaemon

User Process 1

PASManager

Figure 3: The System-Wide DRASTIC Run-Time Architecture

the software engineer when they want to evolve a zone. This process holds the current set of contracts that are being

edited by the software engineer responsible for the zone being changed.

When an evolution is to be performed, the software engineer responsible for the zone to be changed first of all

has to ascertain what effect the change will have on the rest of the zone. This requires them to calculate the effect on

the current set of contracts that are held with other zones. The DRASTIC system assumes that the software engineer

has performed this, that they have written the new contracts and that they have provided the necessary set of change

absorbers. Once this has been performed, the software engineer for the zone being evolved uses thePASManager to

inform theZspmDaemon in the zone being suspended that it should start an evolution.

5.1.1 Performing an Evolution

Assume that the evolution is to user process 2 in zoneAccounts. ThePASManager informs theZspmDaemon

in zoneAccounts that an evolution will take place. This information is forwarded to the two processes that are re-

sponsible for enforcing the currently defined pair-wise contact, theZoneBoundaryProcess (ZBP). These processes

temporarily suspend new incoming invocations (e.g., into zoneAccounts) and they wait for currently ongoing execu-

tions to finish. Once this has happened the zone boundary is considered to be frozen and theZspmDaemon informs

all processes that they should terminate. This causes each process to execute programmer supplied code that will ter-

minate the process promptly in a consistent fashion. To evolve the contents of the zone, any persistent stores that have

23

been used by the processes are processed to translate instances from the old type to become instances of a new type.

While the processes are being evolved, the software engineer, via thePASManager can cause the new collection of

change absorbers to be installed within the appropriateZBP processes. Which change absorber is installed on which

inter-zone reference chain is driven by information provided by the software engineer in the contract. Once this has

been done and the processes have been updated, this concludes the evolution of the zone. The updated processes, and

any new ones, are then restarted and the software engineer uses thePASManager to inform theZBP processes (via

theZspmDaemon) to allow the resumption of the inter-zone method invocations.

5.1.2 Using the Post-Evolution System

After an evolution has been performed an inter-zone reference exists between the two user processes in zonesPur-

chasing andAccounts. This reference travels through the pair ofZBP processes. TheZBP contains the contract

and the collection of change absorbers that are currently in force between the two zones. In zonePurchasing, in user

process 1, is a reference that leads to an object in zoneAccounts. The reference in user process 1 is calledn and this

process believes the reference leads to an object of typeN. At the zonePurchasing ZBP the invocation is translated

from N to T. This is because at some point in the past the object being called on to was of typeT. However, the call is

actually made on to a change absorber that makes available theT interface at theZBP for Accounts zone. Here the

inter-zone reference passes through two changes absorbers which translate the method invocation from an invocation

on aT to one onX and then onto the actual type, typeM. The last change absorber then calls on to the actual object

in user process 2. Any results from the method invocation are returned back along the same reference with opposite

translations taking place.

5.1.3 Discussion

The DRASTIC run-time architecture supports the evolution model described in section 4.1. This has a number of

advantages. As the same set of concepts are used to design the system as are used to manage it at run-time, there is less

burden placed on the software engineer. As the concepts of encapsulation and the use of the translation descriptions

allow the software engineers to plan and to manage the design and ongoing evolution of their system, it makes sense to

provide implementations of these concepts at run-time. In order to evolve a zone it is convenient to be able to suspend

invocations at the zone boundary. Having the concept of the zone exist at run-time allows this to be easily provided

for. It also gives the implementer of the evolution architecture a process to indirect all inter-zone references through,

thus making the management of this part of the implementation easier.

24

As evolution involves the change to the semantics of the distributed system, it is not possible to completely auto-

mate such an evolution. The DRASTIC approach to evolution acknowledges this and places the software engineer at

the centre of the evolution process. They are provided with the necessary tools and concepts to be able to evolve a

running, distributed system and the engineer is expected to manage the process of evolution. This is because evolving

a DRASTIC-based system is a relatively rare, but major, undertaking, requiring coordination between a group of peo-

ple responsible for the other zones that may be affected due to a system change. By using the tools and concepts, the

programmer engages in programming for evolution (supporting principle 8). This should make changing the system

easier as forethought has gone into it being changed and the tools can reduce the burden on the programmer when

actually updating the system.

5.2 GRUMPS Run-Time Architecture

The GRUMPS run-time architecture implements the concepts introduced in section 4.2. Following the design princi-

ples established within the DRASTIC experiment, the evolution concepts within GRUMPS exist at design-time and

they become part of the run-time system (principle 9). The evolution of a GRUMPS program is performed by writing

a program that creates control events which then sends them to deployed objects that have been found by running

QOP queries. Figure 4 shows two views of an executing GRUMPS-based distributed investigation.

cleaneduser data

User 1data

User 3

cleaneduser data

User 1

User 3

User 4

User 2

cleaneduser data

cleaneduser data

User 2

User 4

GEs


collection GUuser data

Key


data cleaning GU

GEs


GEs

(a) Application−Level View (b) Teaq Process−Level View

GU

Thread Grumps EventEPO

Event queue

GE

Teaq Root Process

Teaq process

Figure 4: An Example GRUMPS-based Distributed Investigation

25

Part (a) of figure 4 is the application or user-level view of the application. It consists of four workstations that have

been instrumented with GRUMPS technology [3] to collect mouse click, window focus and keyboard events. This raw

data is sent to a user data collection GU which converts the data into a timestamped GRUMPS event which is passed

to a data cleaning GU. This GU ensures the data is in a suitable form for writing to the cleaned-data database, from

where it will be analysed at a later stage. Cleaning data at run-time may involve the addition of a sequence number to

an event. This number can be used to more efficiently find the next event within a relational database table, rather than

having to write a query that finds such a row based on a calculation to retrieve the next highest event timestamp (see

[32] for more detail on this). Part (b) shows the equivalent application from the Teaq processes point of view. The six

GUs7 are divided across four processes which have been formed into a single tree that is rooted at the process at the

top right-hand corner.

5.2.1 Performing an Evolution

Assume that the event processing object in the data cleaning GU for users one and two is to be replaced. To perform

an update to this GU’s EPO object, code would be written that initiated a query to find the containing GUContainer.

Such code could be part of a generalised object-finding tool which would be another process attached to the tree. The

query would be expressed in terms of finding the GUContainer that contained the GU object of interest, to distinguish

it from the other GU objects within the system. The query is propagated across the tree by the Teaq run-time system

and is executed once inside each process against the collection of registered objects. A match will be found in the

Teaq root process and a control event channel object will be sent back to the query initiator. The tool user could then

send a default EPOUpdate object to replace the EPO in the GU with a new one, or they could use their own kind of

control event to perform more application-specific EPO processing.

5.2.2 Using the Post-Evolution System

In the example, the post-evolution system is structurally the same as the pre-evolution system, but the system’s func-

tionality has been changed as a new EPO has been installed which processes incoming events in a new way. The

structure of the system could be changed during an evolution as a control event can cause the topology of the system

to be altered.

In contrast to the DRASTIC approach to evolution, the GRUMPS approach has no requirement to stop any portion

of the ongoing investigation. Therefore, there is no real notion of using the system after an evolution has been

7This diagram omits the GUContainers for clarity.

26

performed as there are no distinguished phases of pure application computation, separated by discrete periods of

system evolution.

In DRASTIC it is possible to detect at run-time that an inter-zone invocation has failed. However, in GRUMPS,

it is not possible to programmatically detect that a change in the system has caused an event to be processed in an

incorrect way. The two evolution programming models differ in this respect because the DRASTIC approach to

evolution is applied to Java types, whereas in GRUMPS the evolution of the system will effect a single canonical

type, the event. Therefore, in GRUMPS, there is not enough information to be able to detect that an event has been

processed in an erroneous way. The error will only be noticed downstream when an event object is received that

contains state that was unexpected. In GRUMPS, it is assumed that this can be programmed at the application level as

the detection of the error is a semantic issue for the application implementer.

5.2.3 Discussion

The level of encapsulation provided in GRUMPS exists at the GUContainer and GU levels. The design approach for

GRUMPS was to provide a minimal mechanism for managing change within the run-time system, in order to promote

the rapid change of the system. This helps to ensure the system is responsive, simple to use and thus easy to explain

to anyone wanting to use it. Richer kinds of evolution support could be built on top of the GRUMPS system, such as

persistent message queues for the inter-GU data.

The design of the DRASTIC run-time architecture takes the opposite view. How a software engineer evolves a

DRASTIC-based system is heavily prescribed by the DRASTIC evolution model and the approach it takes to evolving

a zone. This is an appropriate approach due to the complexity of changing such a system. The DRASTIC evolution

model, architecture and run-time implementation need to be able to ensure a coherent approach to run-time evolution.

If the software engineer was free to perform certain operations in their own way (for example, some may choose to not

suspend invocations at a zone boundary), other parts of the system may fail as they rely, not only on these operations

being performed, but on them being performed in a certain order. Giving the software engineer too much freedom in a

complex evolutionary system such as DRASTIC could be counter-productive. Thus, the more heavy-weight approach

helps to ensure that the process of applying an evolution to a running system is a well understood operation that is

appreciated by the group performing the change.

The advantage of the DRASTIC approach is that there are standard solutions to particular common problems. In

GRUMPS, it is possible for the same EPO object to be simultaneously replaced by two concurrently executing control

27

events. In the current implementation, the control event to be executed last will succeed. The GRUMPS run-time

system provides themechanismto support this. However, it does not provide any coordination technology to ensure

a particularpolicy is maintained within the system. The DRASTIC system ensures policy is enforced at the zone

boundary, within the context of the contract and change absorbers. The GRUMPS approach to this is to embody policy

in tools and to imbue the run-time architecture with the minimum amount of policy (on top of a flexible mechanism)

to ensure any changes can be done correctly. This promotes an increase in flexibility as the system being changed, the

system performing the change, and the policy that controls the correctness of that change are all separate, allowing for

a change within one to minimally affect any of the other two.

The advantages stated in section 5.1.3 for having the model and design-time concepts present at system run-time

and during an evolution also apply in the GRUMPS system. The main evolution management concepts are the model of

containment and object replacement through the use of control events. The concepts are used at design-time and these

same ideas are reused when the system is evolved. The concepts are also familiar to object-oriented programmers,

reducing the complexity of what they have to learn (adhering to principle 5).

The evolution of both the DRASTIC and GRUMPS systems is a complex endeavour. The DRASTIC and GRUMPS

evolution models and implementations take away some of the complexity of designing an evolvable system and ac-

tually performing the change at run-time. Both approaches focus the software engineer’s attention on the need to be

able to change the system at run-time. In this way, the software engineer is encouraged to consider run-time evolution

to be an integral part of an application’s design (principle 3).

5.3 Comparison of Approach

In a DRASTIC system the kinds of change that could be carried out are the replacement of code for objects that have

had references to them passed outside the zone they exist in. An appropriate change absorber would be placed on

the necessary inter-zone reference chains to handle this update. In terms of the architecture of a DRASTIC-based

system, such an object would provide access to the functionality within a zone. In terms of theAccounts zone in

figure 3, objectm would receive information that the code in thePurchasing zone had bought a good. To ensure the

autonomy of the two zones, the contract between them would allow the change to the type ofm to be made without

having to alter the code that wishes to call into theAccounts zone. In this sense, the zone contract and the change

absorbers allow the design idiom ofdesign by contractto be practised as the exported change absorber ensures that

the pre-existing design contract has been adhered to.

28

In comparison, a change in GRUMPS does not affect thetypeof an object after evolution. GUs communicate in

terms of generic events and the interface for GU reception does not change after an evolution. In GRUMPS it is how

the events are processed that can be changed at run-time (which may make use of new code) and this change is affected

without the need for the heavy-weight suspension of the zone and without the desire to promote domain autonomy. The

GRUMPS approach makes no distinction between the phases of application computation and application evolution, as

DRASTIC does during the suspension of zone activity, and therefore there is no need for a concept like the DRASTIC

zone. The DRASTIC approach is most effective for systems where evolution is a complex endeavour which, to get

right, requires many aspects of the system to be changed during a single evolution phase. Such a complex evolution

requires an approach where one part of the system may be isolated from other parts while the complex evolution

process is applied to it. Once it has been performed, the other parts of the system are then allowed to use it. This kind

of evolution would be the replacement of code within a bank or a hospital system, where the correct replacement of a

part of the system must be performed before the rest of the system may use it.

By contrast, GRUMPS is designed to support the application of many, much smaller kinds of evolution to how data

is processed as it moves across a distributed system. GRUMPS is suited to the rapid change to an already executing

system that needs to process data in a way that may not have been identified when the application was started. In

addition to this constraint, another is that one part of a GRUMPS system may not be able to wait while another part

of it is updated, thus the suspension approach of DRASTIC is not appropriate. These two constraints requires an

architecture that can be updated simply and quickly and in a way that may be performed while the application is

executing, without the need to suspend or take a part of it down.

6 Architecture Implementation

This section describes the key aspects of the implementation of the two architectures that have made run-time evo-

lution easier to support. Section 6.1 starts with a discussion of generally applicable implementation techniques and

then follows in sections 6.2 and 6.3 with details specific to DRASTIC and GRUMPS respectively. For in-depth

implementation details see [8, 9, 11].

6.1 Generally Applicable Implementation Techniques

The generally applicable implementation techniques can be divided into five points: the ability to apply and remove the

evolution system to and from the running system (section 6.1.1); the need for symmetric operations (section 6.1.2);

29

how to support the requirement of run-time decisions (section 6.1.3); and why tool support (section 6.1.4) and the

choice of implementation language (section 6.1.5) are so important.

6.1.1 Applying and Removing the Evolution System

The approach to evolution described in this paper calls for a separation between the application system and the evolu-

tion system (principle 6). Implementing these as two separate, but interoperable, systems has a number of advantages.

Separating them allows the development of the two systems to proceed more independently than they could if they

were combined. Only those parts of the application system that need to interoperate with the evolution system will be

inter-dependent with any changes made in the evolution system. The software for the rest of the application system is

independent of the rest of the evolution system and so changes made at the application side will not affect the imple-

mentation of the evolutionary aspects. As the design of the whole system is divided into two parts that separate what

is being evolved from how it is being evolved, reflecting this in the implementation helps to reinforce this division.

If the evolution system is only ever attached when an evolution must be performed, the run-time overhead that

the evolution system presents is kept to a minimum. The application system only needs to be able to support the

attachment of the evolution system. This part of the system implementation can be optimised as it is that part of

the run-time infrastructure that must be present in the application system to support its own evolution. Any parts of

the evolution system implementation that only need to be executed during an evolution can be kept outside of the

application implementation, thus ensuring the application is not slowed down by executing code it does not need to

for its core task, i.e., providing the functionality of the application to its users.

The application system discussed above can be further divided into two parts; the code dedicated to performing

the functionality of the application; and the code to integrate the rest of the application with the run-time evolution

architecture. Ideally, all code that interfaces the application with the underlying evolution system should be tightly

encapsulated and present in as few places as possible. This is made possible if there is a clear distinction between the

various parts of the application system.

In DRASTIC, code that embodies run-time evolution support (such as the change absorbers) becomes part of the

post-evolution system. Change absorbers are installed on inter-zone references which remain after an evolution has

been performed. Application-level inter-zone references are redirected via the DRASTIC platform in each process.

Therefore, in DRASTIC, the places at which the application code and run-time evolution support architecture interface

are tightly encapsulated, but they are present in the architecture at several points. This is because in the DRASTIC

30

architecture there are several distinct points along the inter-zone reference chain where evolution support may be

added. The advantages of keeping the number of points to a minimum was something that was learned later on in the

GRUMPS project.

The GRUMPS-based application support system consists of the Teaq and QOP subsystems and the GUContainers

and GU objects. The functionality of the application is embodied in the event processing objects and in any code that

they call. Thus, in GRUMPS, the distinction between the pure application code and that part of the application that

is interfaced with the evolution support system is made clear to the programmer at the level of the event processing

object. It is in this object that the boundaries of the two systems inter-mingle. The interface between the two systems

is restricted to a single concept in the system and as such it is clearer for the programmer to understand the inter-

dependencies that exist between these parts of the application. Knowing this allows them to make better decisions

about where parts of their application code should reside. If they are placed into the event processing object the

programmer needs to be aware that this code is subject to GRUMPS-supported replacement. If the code is not held

here, it therefore does not interface with the evolutionary support provided by the GRUMPs system and so cannot be

replaced.

In GRUMPS a programmer may further encapsulate the evolution system by placing it into a separate process.

This process can attach itself to the spanning-tree and can then monitor the ongoing activity in the system by inspecting

the GUContainer and GU objects. After the evolution has been performed, it can be detached and from this point on

the evolution of the system will present no run-time overhead.

The two different approaches described above highlight another difference between the DRASTIC and GRUMPS

approaches to run-time evolution. In DRASTIC objects are added to the executing system that remain after the

evolution has been performed and these objects bring additional semantic meaning to the application. By contrast,

evolution in GRUMPS is performed by sending an object into the system which will have some semantic-altering

affect on it (principle 4), but after the effect has been applied, this object is removed from the system. The advantage

for GRUMPS is that the object used to perform the update does not present a run-time overhead in the way that the

change absorbers in DRASTIC do. However, the advantage of the change absorbers is that the software engineer has

an explicit list of objects that inter-zone interactions must travel through. A by-product of this explicit list is that the

software engineer has a history of the evolutions that were performed and the order in which they were applied.

31

6.1.2 Symmetric Operations

When implementing an evolution support system it is important to provide a mechanism that can support symmetric

operations. A symmetric operation is one that may be applied as well as removed. If only one operation is provided,

this is referred to as asymmetric. It is important to provide symmetric operations as an evolution operation that has

been applied to a system may need to be removed at a later stage.

When evolving a running system, a side effect may occur that the removal operation may not be able to undo.

This is an issue for the software engineer to deal with. It is typically not something that the implementation of the

underlying evolution support system can deal with. The point here is that if symmetric operations are not provided,

certain operations on the whole system cannot be (easily) performed. The GRUMPS system provides symmetric

operations by providing support for objects to be added and also removed from a GUContainer. DRASTIC supports

this by allowing change absorbers to be added and removed from inter-zone reference chains.

When implementing a software system, opportunities arise for its optimisation. For example, Java defines an

optimised set of bytecodes. Typically, when implementing an optimisation for a system, the need to be able to

subsequently perform an evolution within the context of the optimised system is not always obvious. This can be

because the person who is implementing the optimised operation is not focussed on the need for evolution and so

there may be no way to reverse the effects of the optimisation. This can lead to an asymmetric design which can make

evolution very difficult to perform. If it is not possible to undo an optimisation, the implementer of the evolution

system has to provide two versions, one for the unoptimised system and one for the optimised version. The two

versions may also be quite different at the implementation level due to the nature of the optimisation, making the

implementation of the evolution support system more complicated. For these reasons, it is preferable to be able to

undo the effects of the optimisation so that the system may be evolved in its unoptimised state. This then only requires

one version of the evolution system.

6.1.3 Supporting Run-time Decision Making

To evolve an executing system those responsible for updating it must be able to make as many system-changing

decisions as are appropriate at run-time. If important aspects of the executing system are fixed at some other stage and

these cannot be changed at run-time, the only way to change them is to stop that part of the system. Their redesign and

reimplementation may occur in parallel with the currently executing system, but to deploy the new code, some part of

the system will have to be shutdown. However, it may not be feasible to perform any of these operations. For example,

32

the executing system may not be able to survive having parts of it removed. If this is required, such a fundamental

requirement should be considered at the design stage of the evolution support system, as it was with the DRASTIC

project. It may be convenient to be able to make certain decisions before the system is deployed and executed. If those

decisions cannot be changed once the system is running, the limited situation above arises. If these decisions may be

changed at run-time, those using the system are given more choice in how they evolve and change the system during

its lifetime.

However, in the DRASTIC system it was convenient to separate and make distinct the phases of application

computation from the discrete periods of application evolution. This design was chosen as a simplifying assumption

to help manage the level of complexity when evolving a single zone. It would be possible to design a system where

more decisions could be made at run-time, however, the engineering of that system would be more complicated. For

example, rather than suspending all calls into a zone, a zone could be evolved on a per-process basis. This would result

in some processes executing post-evolution code while others would be simultaneously running pre-evolution code.

The software engineer would then have to handle the potentially complex interaction between executing processes

on both sides of the evolution wave. The disadvantages of handling this interaction would outweigh the advantages

that could be derived from performing evolution concurrently with application execution in that zone. This is because

the application programmer would have to manage the interaction of pre- and post-evolution processes as this is

a semantic operation that only they can understand and, therefore, manage (which is an example of the need for

principle 4). The DRASTIC evolution run-time system provides the mechanism to support this interaction, but it

cannot help in establishing a meaningful and safe inter-process interaction. Therefore, although it is preferable to

provide support to make as many decisions at run-time as is appropriate, there may be sound engineering reasons why

support for certain kinds of run-time decision is not provided.

In contrast, the design of the GRUMPS run-time evolution system encourages the software engineer to make as

many evolution decisions as are appropriate at run-time. This was done to maximise the likelihood that an entire

GRUMPS-based application system could stay running while it was also being updated. A control event can be sent

to a deployed object to change the way that it deals with its incoming events. This does not require the process

that the object exists in to be brought down. However, providing this kind of approach to run-time evolution has

implications for other parts of the system. The evolution of a GRUMPS system is less controlled and less managed

that a DRASTIC system. Therefore, those evolving a GRUMPS system have to be careful that a change does not have

knock-on effects throughout the whole system, which is something that the DRASTIC system explicitly managed.

33

The current implementation of GRUMPS provides the basic set of mechanisms to allow an application system to be

updated at run-time. If additional kinds of evolution support are necessary, these can be provided above the system,

using a standardised set of control events to express this.

Providing support to perform as many evolutionary changes as are appropriate while the system is executing does

lead to a flexible run-time environment. However, this level of power is not always appropriate in every situation.

For this reason, those implementing run-time evolution systems should take care to identify those cases where pro-

viding such support could be a disadvantage due to the implications it has for other parts of the evolution system’s

implementation.

6.1.4 Tool Support

Typically, before a system is changed, the software engineer will want to perform a preliminary inspection of the

running system to check its current state. This will allow them to judge whether the approach they want to take to

changing the system is appropriate, e.g., to ensure no invariants are broken (see section 3.3). To be able to effectively

do this within an executing, distributed system requires tool support. The design and implementation of such tools

was not the central focus of either of the projects reported here. However, they would have proved useful towards

the latter parts of the systems’ design and implementation to ensure a test evolution proceeded in a well-known and

predictable manner.

Such tools should support the person evolving the system in a number of ways. They should allow them to

perform coordinated changes to multiple parts of the distributed system. For example, within GRUMPS, being able

to evolve two GU objects that reside within different processes is not something that the evolution support system

directly supports. It would be appropriate to place such support into a tool that made use of the existing evolution

mechanisms. Tools to visualise the current state of the distributed system would also be valuable. Before performing

an evolution, those changing the system need to understand its current state. Once the current system is understood,

support for performing “what-if” analysis on the system would be extremely useful. A software engineer could pose

questions to the tool that embodied hypotheses the engineer had about particular ways of changing the system. The

tool would then respond with results that indicated the kinds of effect a particular approach to evolution would have

on the executing system. This information would be used by the software engineer to modify their approach to an

evolution.

34

6.1.5 Implementation Language

The choice of implementation language will have an impact on the kinds of evolution that may be performed. Both

the DRASTIC and GRUMPS systems are programmed in Java and both evolution support systems are implemented in

the same language as that used to implement the application system. Using the same language for both the evolution

and application system makes evolving the application system easier as language-level entities may be freely passed

between the two systems. Java can include code written in other languages (typically C), however, providing access to

the Java-based evolution framework from another language would create practical engineering problems. For example,

when freezing a DRASTIC zone, any processes that made use of any non-Java code would have to ensure this state did

not interfere with the evolution process. This code, which exists outside the evolution framework, may rely on types

within the framework that are being evolved. To effectively address this requires technology to track and manage the

evolving state within and outside the evolution framework. This is a complex problem in its own right and would

require extensive additional research to address.

The programming language defines the most primitive kinds of evolution that may be performed. For example,

Java makes the replacement of code behind a Java interface easy so this approach was used for the GRUMPS project

where such a replacement was appropriate. However, the DRASTIC project wanted to support evolution operations

that were not compatible with the Java type system (for example, the ability to remove a method from an evolved

type). Therefore, as Java was being used, a different approach (based on change absorbers) was used. If the Java

language definition supported method removal (as other object-oriented languages such as Self [5] and Beta [22] do),

the design and implementation of the change absorber mechanism would have been different as there would have been

more direct support from the underlying programming language.

6.2 DRASTIC Architectural Implementation

In terms of supporting the evolution of a run-time, long-lived, persistent, distributed system, the DRASTIC project

highlights two implementation techniques that have made the provision of such evolution easier. The use of indirection

is discussed in section 6.2.1 and section 6.2.2 describes the advantages to be gained by a separation of functional roles

within the run-time system.

35

6.2.1 Using Indirection

The DRASTIC run-time architecture shown in figure 3 shows the rerouting of an inter-zone reference via the two

ZBP processes responsible for enforcing the contract between the two zones. This indirection is advantageous for

two reasons. Firstly, as the inter-zone reference chain is redirected via the DRASTIC platform, the platform may

manipulate the inter-zone reference chain to provide the evolutionary facilities described in section 4.1. Secondly, by

adding objects to the reference chain allows those responsible for evolution to change the semantics of an inter-zone

method invocation, without the invoker or invokee being aware of this change.

The DRASTIC evolution platform was written in such a way that the objects that are interposed onto the inter-zone

reference chain can be added and removed such that both the invoker and the invokee are unaware of their presence.

A reference to an object may be handed out to a process in another zone, but the software engineer who is responsible

for the zone where the object resides can still retain some control over how invocations to that object are handled.

This is useful in a system where the evolution of objects can occur at any time within a zone. As objects may be

placed onto the reference chain to affect its operation after the reference has been handed out, the process that holds

the invocation end of the chain does not have to be updated when an evolution within the zone takes place. This is

a classical application of indirection. Being able to affect the semantics of an inter-zone invocation are particularly

useful in a distributed system where it can be difficult to control how a reference is used once it has been handed out

to another process. The receiving process may reside within another zone which is at some arbitrary point within the

distributed system that the handing-out process has no knowledge of and no control over.

An overuse of indirection can add an unwanted amount of run-time overhead to an inter-zone method invocation.

This can be reduced by combining some logically separate operations into single objects. For example, the software

engineer could combine a number of distinct change absorbers into a single change absorber, reducing the overhead

of calling through the chain and, hopefully, optimising the translations that are required. Separating logical operations

is a useful approach to building evolvable systems, however, indirecting application-level references should not be

overused.

6.2.2 Separation of Functional Roles

The DRASTIC run-time architecture separates the different functional roles within a distributed system, abstracting

them into separate processes. For example, the role of evolving a zone is given to theEvolverMgr and the job of

handling the persistent meta-data for the whole system is delegated to thePASManager (section 5.1). This separation

36

leads to a clean implementation of the evolution mechanism as different functional facilities are encapsulated into

separate processes. Separating the different functional roles in this way can lead to some additional fault tolerance

as system functionality is not contained within a single process. However, it can be tedious (and thus possibly error-

prone) to boot such a system as a number of processes have to be managed and started in a particular order if there are

any inter-dependencies between them.

6.3 GRUMPS Architectural Implementation

The main aspect of implementing the GRUMPS run-time evolution architecture was choosing the appropriate tradeoff

between supporting run-time decision making and allowing those who have programmed the system the opportunity

to enforce their own decisions.

This point is best illustrated by considering the implementation of the query-oriented programming mechanism.

The QOP mechanism allows a GRUMPS programmer to find objects within the distributed system that match a query.

The query is propagated to each process and the QOP run-time mechanism translates the query string into a single

Java class, storing it as source code. To perform the query, the source code is first compiled by the GRUMPS run-time

platform. The class implements an interface that provides a well-known method which is called by the QOP system to

execute the query. This has the advantage that the query is strongly typed and the loaded class is subject to the usual

Java checks, such as bytecode verification. Making the compiler available at run-time allows the programmer to write

queries based on information that is only available at run-time, allowing the programmer to defer some decisions until

application execution.

Objects that are registered for remote query may be instances of an interface type called Replaceable. The Re-

placeable interface defines a single method (called Replace) that takes one object as an argument and returns an object

as the result. If a query result contains an instance of this type, the QOP system, via its own object-serialization

mechanism, passes a reference to the matched instance to this method. The implementer of the interface may then

return an object of a different type which will become part of the query result that is passed back to the query initiator.

This mechanism is used by the GRUMPS run-time system to translate matches to GUContainer objects into objects

that refer to their control channels (section 4.2).

In the current implementation of the QOP mechanism the implementer of the interface controls how its instances

are managed should any one of them be matched in a query. There is no facility in the current implementation to be

able to override how such an instance is processed. Once these objects are deployed it is possible for the user of the

37

system to want to be able to process them in a different way that has only come to light after the code was written

and after the objects were created and sent into the system. For certain GRUMPS system-management operations it

would be convenient to be able to specify in the query how a matched object was processed. See section 7.3 for more

information.

7 Conclusions

There are three main conclusions from the DRASTIC and GRUMPS work in addition to the four core ideas and the

formulation of the ten principles of evolution: an evolution model and its implementation is crucial to successful

distributed system run-time evolution; application systems built using them must be programmed for evolution; and

allowing decisions to be performed at run-time is a powerful and flexible approach to building both the run-time

evolution support systems and the applications that make use of them.

7.1 The Evolution Model and its Architecture

The DRASTIC and GRUMPS work has shown that in order to support the run-time evolution of a large, long-lived,

distributed system an evolution model and its implementation should be made available to those responsible for design-

ing and evolving the distributed application. This is because the support for the evolution of a distributed application

should be handled by a distinct part of the system. Without an evolution model and its architecture, it would be harder

to cleanly design the application and separate those parts of the system that were and were not subject to run-time evo-

lution. This would confuse the design and would lead to different (possibly incompatible) implementation approaches

being taken to run-time change.

Both the DRASTIC and GRUMPS systems make available technology to support the design and implementation

of a distributed application that could be evolved during run-time. Two language-level libraries, documented APIs

and two different evolution-supporting component architectures were provided. In addition, tools and methodologies

were provided to support the software engineer in preparing for, performing and subsequently monitoring the post-

evolution system. Both of these technologies were provided in Java with the software engineer using this language to

solve their domain-specific problem, within the context of, and by receiving support from, the evolution technology.

An evolution model and its implementation bridges the semantic gap between the evolution facilities provided

at the programming language and component framework levels and any higher-level facilities that other approaches

have made available, e.g., using the architectural-evolution systems. When viewed in this way, this kind of run-time

38

evolution support can be seen as evolution middleware. The evolution platform provides a standard set of evolution

facilities for higher-level code by abstracting over the particular support provided at the lower level. A middleware-

oriented approach is useful even if there is no higher-level evolution functionality as the resulting standardisation helps

the programmer to design and build an application system that may be more easily understood and changed by others

in the future.

In the same way that there is no one universal programming language and no single globally-applicable middleware

solution, there is also no single evolution model. The DRASTIC and GRUMPS work shows that the evolution model

and its implementation can be tailored to integrate with the computing environment within which evolution needs to

be performed. The DRASTIC and GRUMPS evolution models both suit their environments and their implementations

can make appropriate tradeoffs based on this environment. For example, within DRASTIC it was appropriate to design

and implement the system to interpose evolution technology on inter-zone, inter-object references. However, such a

solution was not appropriate within the GRUMPS system, because of the differences in the kinds of application that

the two systems were aimed at. In GRUMPS it was more appropriate to apply evolution by replacing the functionality

of an object, rather than abstract over a change made in one part of the system.

7.1.1 Creating an Evolution Model

The evolution model and its implementation must support two different aspects of run-time evolution. The evolution

model and any tools should support thedesignof an evolvable system, and the programmable component must support

the software engineer inperformingthe change to the running system. As the software engineer wants to produce a

system that may be changed at run-time, the evolution model should support them in designing a system to be changed

in such a way. The advantage of the evolution model in this area is that it focuses the attention of the software engineer

on the need for run-time change at the very start of the development of their code, making change a central part of the

architecture of their system.

When implementing an evolution model it is important that both a rich set of evolution facilities are made available

to the application programmer, and that the facilities provided by the implementation language are exploited. Both

DRASTIC and GRUMPS make available a richer set of evolution primitives than those directly provided by the Java

language: DRASTIC supports subtractive change to an evolved Java type, and GRUMPS supports the programmer

in providing default, but extensible support, for simultaneously updating a number of objects within a single GU-

Container. GRUMPS-based run-time evolution support is made much easier by exploiting the underlying language’s

39

ability to easily replace one class with another, via object reference update. Exploiting the language makes it easier to

explain to the software engineer how evolution may be performed in the system as it is just a particular application of

the programming model they are already familiar with.

Once the design of the evolution model has been completed, the implementation of the framework is relatively

straightforward. The implementation of the two run-time evolution frameworks described here represent approxi-

mately five man-years of work. Both implementations are small by modern standards, the core of the DRASTIC

implementation is just over 9,500 lines of code (LoC) and GRUMPS is just under 12,500 LoC8. These figures are not

to suggest that the construction of a run-time evolution framework is trivial or can be quickly implemented, but that it

is possible to provide such a framework. Implementing a framework for the run-time evolution of a distributed system

is similar in scope to the construction of other kinds of framework. For example, Sun Microsystem’s J2EE enterprise

programming framework is a large and complex piece of software. However, it benefits those that use it by providing

standard programming solutions, reducing the amount of work they have to do and increasing the likelihood that their

software can be successfully integrated with other J2EE-based software solutions.

7.2 Programming for Evolution

The second outcome of this work is that systems must be programmed in a way that is sympathetic to the possibility

that they may be changed at run-time. This is made easier if an evolution model is provided.

The maintenance of software systems typically consumes between 40 and 80 percent of project costs [13]. If a

system cannot be taken down in its entirety to perform a change, it is inevitable that some form of run-time evolution

will be required. In this context, the evolution of a running system is necessary and inevitable. By acknowledging that

evolution will be necessary, provision for it can be made at the start of the software lifecycle, addressing the needs of

evolution as an integral part of the design and implementation of the application system. The work reported in this

paper has shown that it is not always desirable to abstract over all forms of change and trying to hide all change from

other parts of the system can be counter-productive (see section 3.3).

Changing the implementation of an executing program changes its semantics. As a result, run-time evolution

cannot be fully automated, however, this work shows it can be effectively supported. The software engineer can

increase the effectiveness of this support by programming for evolution and by acknowledging that not all evolutionary

change can be abstracted over. Such an approach to programming requires the software engineer to ask additional

8These figures do not take into account any tools or other management software, e.g., DRASTIC’sPASManager.

40

questions during the design and implementation of their application code. For example, in DRASTIC it is possible for

an inter-zone method invocation to fail after an evolution has been performed. The programmer of the invoking code

has to be aware that an invocation can fail for evolutionary reasons, as well as other well-established reasons, such as

due to distribution.

To support run-time change to an executing application it is crucial that those performing the change can make as

many decisions as are appropriate at run-time. The evolution model and its implementation should be flexible enough

that any decisions that have been made at a time before application deployment and execution can be changed once the

application has been started. However, there is a tradeoff to be performed here as supporting the run-time change to

every component within an executing system can leave its implementers with no control over aspects of it that should

not be changed.

Working in a distributed environment can aid as well as hinder the ability of a software engineer to change an

application system at run-time. Distribution can be a hindrance because of its classical features, e.g., partial failure. It

can aid in evolution, however, as processes and machines provide a natural boundary at which to perform evolutionary

change. This was exploited in the DRASTIC system in providing the majority of the evolution support in theZBP

processes.

At the source code level, the use of the evolution model and its API permeates many different parts of the imple-

mentation of a software system. Therefore, it is currently difficult to see how the code that implements the process of

performing an evolution and the application code that reacts to a change (e.g., the DRASTIC code to handle the failure

of a post-evolution method invocation) can be abstracted into aspects [20]. Evolution support is a fundamental part

of the implementation of an application system, with the evolution API providing a standardised approach to using

that support. Such support is as fundamental to a program as is the level of support for distribution and the handling

of concurrency. It is an open question as to degree to which aspects could be usefully exploited in the design and

implementation of an evolution model (see section 8.1.3).

7.3 Run-time Decision Making

Section 6.3 described a particular instance of the more general problem of choosing an appropriate tradeoff between

supporting flexibility via run-time decision making and the need for programmers to be able to enforce the decisions

they have made in their code.

In the current implementation of the QOP query mechanism, a software engineer can fix how a matched instance

41

is handled by QOP. The programmer enforces this by providing the code to process the matched instance as part of

that instance’s Java class. However, another choice is possible, which is to allow the processing decision to be made

by the query that found the instance.

If the decisions could be performed by the query, any decision that the author of the class had made would be

ignored. The challenge is finding the appropriate balance between these two approaches. Allowing the implementer

of the interface to make the decision gives them control over how their instances are treated by the rest of the system.

However, given the current GRUMPS run-time infrastructure, once the software engineer has created the Java class

to do this, the choices that it embodies are fixed. The alternative of placing the decision-making code in the query is

more flexible as the decision can be performed at run-time. This can be useful when updating a system as code fixed

in the Java class may not be suitable for use in an evolving context. The implementer of the code cannot possibly

anticipate its every use. In addition, the implementer cannot write code to handle an evolutionary case that has yet to

be defined. Therefore, it would be useful to be able to specify this new code at run-time, as part of the query. However,

without some form of object access model, the author of the Java class has no control over how their pre-evolution

instances will be treated by the rest of the system.

8 Open Issues and Future Work

This section discusses the open issues for the support of run-time evolution in general (section 8.1) and then goes on

to discuss particular improvements to the GRUMPS work in section 8.2.

8.1 Open Issues

There are four main open issues facing those working in the field of run-time evolution: where in the computing

hierarchy should particular types of run-time evolution be supported (section 8.1.1); what is the exact set of run-time

evolution primitives (section 8.1.2); whether or not aspects may be a benefit to using a run-time evolution framework

(section 8.1.3); and how object security is reconciled with a desire for flexible run-time change (section 8.1.4).

8.1.1 Placement of Run-time Evolution Support

The DRASTIC and GRUMPS work has benefited from providing evolution support above the level of component

frameworks as programmers can make use of the evolutionary facilities as if they were a part of a more comprehen-

sive component framework. However, within the field of run-time evolution, it is not clear where in the software

42

architecture hierarchy certain facilities could be most effectively placed. This is possibly because a comprehensive

survey of these issues has not been conducted and systems that are implemented (including DRASTIC and GRUMPS)

tend to remain within one layer of the hierarchy. As distributed application systems become more complex, the need

for run-time evolution support at particular points within the hierarchy will become more pressing because developers

of systems at higher levels will not want to reimplement facilities that could be more reliably and more powerfully

implemented below.

8.1.2 Defining a Useful Set of Run-time Evolution Primitives

The work on DRASTIC and GRUMPS address some of the issues of supporting run-time evolution within two specific

computing environments. Defining a generally useful set of run-time evolution primitives within a particular evolution

model is an open issue. As stated in section 7.1, one universal evolution model probably cannot be defined. In

addition, the primitives that are provided by the DRASTIC and GRUMPS systems are targeted towards those domains;

no attempt has been made to generalise the support to other application domains. It would be a useful exercise

to consider run-time evolution support within the context of a commercially available component framework. An

empirical evaluation of the approaches described in this paper could now be usefully performed, given that the initial

research into the evolution models and their architectures has been completed.

8.1.3 The Use of Aspects

It is an open issue if the use of aspects would benefit the design of the evolutionary parts of a large, long-lived,

distributed application. This issue is open because the work on evolution models as defined in this paper and the work

on AOP have been kept separate. The work of Kienzle and Guerraoui [21] suggests that some uses of aspects may be

difficult when that aspect is heavily associated with the phenomenon that the underlying object is modelling, such as

concurrency or distribution. The objects that are to be evolved are intertwined with other objects (that are possibly not

being evolved) as well as other characteristics of the application, such as the execution of threads of control. Due to

this intertwining, it is not clear at this stage if aspects could be used to fruitfully abstract over the evolutionary parts

of an application.

8.1.4 Run-time Evolution and Object Security

Section 7.3 describes a particular instance of a wider issue which is the simultaneous desire to provide a flexible

evolution mechanism with a wish to also allow those programming the system to specify what may not be changed,

43

e.g., for reasons of security. Those making an object available within a system may want to specify that access to it is

restricted in some way and this desire may impede someone else’s ability to change the system, e.g., the inability to

access part of an object’s state when moving data between different versions of an object.

To address this problem some form of object access protocol or underlying security model would be required

so that the flexibility possible with the ability to perform a run-time decision can be controlled by the implementer

should they wish. This is an area that the designers and implementers of evolution support systems should consider

as a fundamental part of the evolution model as it will have a large impact on the programming model (c.f. the Java

security model and its impact on the Java language and its core classes).

8.2 Future Work

The DRASTIC and GRUMPS systems represent an initial attempt at defining, designing and implementing two evo-

lution models and their architectures for distributed systems.

Future work for the GRUMPS project9 can be divided into two main areas. One fruitful area of future work would

be to assess their impact on the production and subsequent evolution of a distributed system that others relied on every

day. The initial evolution design and implementation work has been performed and now would be an appropriate time

to take the proof of concept systems forward, reconsidering them within the context of a more constraining real-world,

commercial setting.

The DRASTIC and GRUMPS work has not addressed the area of checking for update safety because investigating

the basic evolution model support was the central focus of both of these projects. However, a security model would

make the application of an evolution model safer, especially within the context of a distributed system. Such work

has been considered by Hicks and his application of verifiably safe native code [16] and such an approach within the

context of DRASTIC or GRUMPS may be of value.

9 Summary

This paper has described the outcome of designing and implementing two different run-time evolution models. The

work on the DRASTIC and GRUMPS systems has demonstrated that it is possible to support the run-time evolution

of distributed systems by making available an evolution model and its architecture, and expressing it to a software

9The DRASTIC project finished in 1998.

44

engineer via a combination of library code, tools and methodology support. This work has also shown it is possible to

use an application system even though part of it may be being changed at run-time.

The lessons learned have been distilled into four generally applicable core ideas and the ten principles that may

be used when considering the design and implementation of an evolution model. It has been shown that an evolution

model can be easily designed and implemented, although it must be tailored to the environment in which evolution is

to be performed.

It has been argued that there is a lack of run-time evolution support within the software architecture hierarchy,

in between component models and any higher-level evolution support or application-level systems. The DRASTIC

and GRUMPS work is a first step towards providing the necessary run-time evolution model and implementation to

address the open issues within this gap. More work needs to be conducted in this area and the provision of an evolution

model that could be supported within the context of J2EE or .NET would be a fruitful area of research.

10 Acknowledgements

The author gratefully acknowledges the support provided by the GRUMPS team and the UK’s EPSRC research council

for providing the funding for both the DRASTIC (grant GR/J99285) and GRUMPS (grant GR/N38114) projects. The

author also thanks Ray Welland, Peter Dickman, Michael Dales and Gareth P. McSorley for reading earlier drafts of

this paper.

References

[1] Mehmet Aksit, Ken Wakita, Jan Bosch, Lodewijk Bergmans, and Akinori Yonezawa. Abstracting Object In-

teractions Using Composition Filters. In Rachid Guerraoui, Oscar Nierstrasz, and Michel Riveill, editors,Pro-

ceedings of the ECOOP’93 Workshop on Object-Based Distributed Programming, volume 791, pages 152–184.

Springer-Verlag, 1994.

[2] J. Armstrong, M. Williams, and R. Virding.Concurrent Programming in Erlang. Prentice-Hall, Englewood

Cliffs, NJ, 1993.

[3] Malcolm Atkinson, Margaret Brown, Julie Cargill, Murray Crease, Steve Draper, Huw Evans, Philip Gray,

Christopher Mitchell, Martin Ritchie, and Richard Thomas. GRUMPS Summer Anthology, 2001. Technical

Report TR-2001-96, Department of Computing Science, Glasgow University, September 2001.

45

[4] Toby Bloom and Mark Day. Reconfiguration and Module Replacement in Argus: Theory and Practice.Software

Engineering Journal, 8(2):102–108, March 1993.

[5] Craig Chambers, David Ungar, and Elgin Lee. An Efficient Implementation of Self, a Dynamically-Typed

Object-Oriented Language Based on Prototypes.ACM SIGPLAN Notices, 24(10):49–70, October 1989. OOP-

SLA ’89 Conference Proceedings, Norman Meyerowitz (ed), New Orleans, Louisiana.

[6] Huw Evans. Query-Oriented Programming. Unpublished paper,

Available fromhttp://www.dcs.gla.ac.uk/˜ huw/, 2003.

[7] Huw Evans. Run-Time Evolution in Distributed Systems. PhD thesis, University of Glasgow, November 2003

(in preparation).

[8] Huw Evans, Malcolm Atkinson, Margaret Brown, Julie Cargill, Murray Crease, Steve Draper, Phil Gray, and

Richard Thomas. The Pervasiveness of Evolution in GRUMPS Software.Software: Practice and Experience,

33(2), February 2003.

[9] Huw Evans and Peter Dickman. DRASTIC: A Run-Time Architecture for Evolving, Distributed, Persistent

Systems. In Mehmet Aksit and Satoshi Matsuoka, editors,Proceedings of the European Conference on Object-

Oriented Programming (ECOOP ’97), volume 1241 ofLNCS, pages 243–275, Jyv¨askyla, Finland, June 1997.

Springer.

[10] Huw Evans and Peter Dickman. Supporting Software Evolution in a Distributed Persistent System. In An-

dre Schiper and Marc Shapiro, editors,Proceedings of the 2nd European Research Seminar on Advances in

Distributed Systems (ERSADS ’97), volume 2, pages 147–152, Zinal, Switzerland, March 1997. EPFL.

[11] Huw Evans and Peter Dickman. Zones, Contracts and Absorbing Change: An Approach to Software Evolu-

tion. InProceedings of the Conference on Object-Oriented Programming, Systems, Languages and Applications

(OOPSLA ’99), volume 34 ofSIGPLAN Notices, pages 415–434, Denver, Colorado, USA, October 1999. ACM.

[12] Huw Evans and Peter Dickman. Peer-to-peer Programming with Teaq. In Enrico Gregori, Ludmila Cherkasova,

Gianpaolo Cugola, Fabio Panzieri, and Gian P. Picco, editors,Workshop on Web Engineering and Peer-to-Peer

Computing, pages 289–294. Networking 2002, Springer Verlag, LNCS 2376, May 2002.

[13] Robert L Glass.Facts and Fallacies of Software Engineering. Addison-Wesley, 2003.

[14] Adele Goldberg and David Robson.Smalltalk-80: The Language and its Implementation. Addison-Wesley,

1983.

46

[15] The Grumps Project Website, 2001. http://grumps.dcs.gla.ac.uk/.

[16] Michael Hicks.Dynamic Software Updating. PhD thesis, University of Pennsylvania, 2001.

[17] Robert Hirschfeld, Matthias Wagner, and Kris Gybels. Assisting System Evolution: A Smalltalk Retrospective.

In Gunter Kniesel, Joost Noppen, Tom Mens, and Jim Buckley, editors,The First Workshop on Unanticipated

Software Evolution (USE2002) inECOOP2002 Workshop Reader, LNCS 2548, Malaga, Spain, 2002. Springer.

[18] J2EE 1.4 Architecture Specification.http://java.sun.com/j2ee/1.4/docs/.

[19] Gregor Kiczales, Jim des Rivi`eres, and Daniel G. Bobrow.The Art of the Metaobject Protocol. MIT Press, 1991.

[20] Gregor Kiczales, John Lamping, Anurag Mendhekar, Chris Meada, Christina Lopes, Jean-Marc Loingieter, and

John Irwin. Aspect-Oriented Programming. In Mehmet Aksit and Satoshi Matsuoka, editors,Proceedings of the

European Conference on Object-Oriented Programming (ECOOP ’97), volume 1241 ofLNCS, pages 220–242,

Jyvaskyla, Finland, June 1997. Springer.

[21] Jorg Kienzle and Rachid Guerraoui. AOP - Does It Make Sense? The Case of Concurrency and Failures. In

Boris Magnusson, editor,16th European Conference on Object-Oriented Programming (ECOOP 2002), LNCS

(Lecture Notes in Computer Science), Malaga, Spain, 2002. Springer Verlag. Also available as Technical Report

IC No 2002/016.

[22] Ole Lehrmann Madsen, Birger Moller-Pedersen, and Kristen Nygaard.Object-Oriented Programming in the

BETA Programming Language. Addison-Wesley, Reading, 1993.

[23] Jeff Magee, Naranker Dulay, and Jeff Kramer. Regis: A Constructive Development Environment for Distributed

Programs.Distributed Systems Engineering Journal, 1(5):304–312, 1994.

[24] Microsoft Corporation.Microsoft C# Language Specifications. Microsoft Press, 2001.

[25] Microsoft .NET Home Page, 2002.http://www.microsoft.com/net/.

[26] Misha Dmitriev, 2003.http://www.experimentalstuff.com/Technologies/HotSwapTool/.

[27] Erik Odberg. A Framework for Managing Schema Versioning in Object-Oriented Databases. InDatabase and

Expert Systems Applications - DEXA ’92, Valencia, Spain, September 1992.

[28] Peyman Oreizy, Nenad Medvidovic, and Richard N. Taylor. Architecture-Based Runtime Software Evolution.

Technical Report ICS-TR-97-38, University of California, Irvine, Department of Information and Computer

Science, September 1997.

47

[29] Alan Pope.The CORBA Reference Guide. Addison Wesley, 1998. ISBN 0-201-63386-8.

[30] Streve Waterhouse, 2003.http://search.jxta.org/.

[31] Clemens Szyperski.Component Software: Beyond Object-Oriented Programming. ACM Press and Addison-

Wesley, New York, N.Y., 1998.

[32] Richard Thomas, Gregor Kennedy, Steve Draper, Rebecca Mancy, Murray Crease, Huw Evans, and Phil Gray.

Generic Usage Monitoring of Programming Students. In20th Annual Conference of the Australasian Society

for Computers in Learning in Tertiary Education (ASCILITE 2003), Dec 2003.

[33] W3C Web Services Activity. http://www.w3.org/2002/ws/.

[34] Gio Wiederhold, Peter Wegner, and Stefano Ceri. Toward megaprogramming.Communications of the ACM,

35(11):89–99, November 1992.

48

List of Figures

1 An Inter-zone method Invocation via the Zone Contract . . . . . . . . . . . . . . . . . . . . . . . . 17

2 An Example GRUMPSNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3 The System-Wide DRASTIC Run-Time Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 23

4 An Example GRUMPS-based Distributed Investigation . . . . . . . . . . . . . . . . . . . . . . . . 25

5 An Inter-zone method Invocation via the Zone Contract . . . . . . . . . . . . . . . . . . . . . . . . 50

6 An Example GRUMPSNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

7 The System-Wide DRASTIC Run-Time Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 51

8 An Example GRUMPS-based Distributed Investigation . . . . . . . . . . . . . . . . . . . . . . . . . 51

49

Zone Purchasing Zone Accounts

M

User Process 2

m:M

User Process 1

n:NContract Contract

changeabsorber

N

Figure 5: An Inter-zone method Invocation via the Zone Contract

User 1

cleaneduser data

User 2

Key



ThreadEvent queue

ChannelControl

Channel

GEGE

GE

Control

EPO

GE Grumps EventGUContainer GU

Teaq Root Process

Figure 6: An Example GRUMPSNet

50

Personnel Zone

PurchasingZone

Accounts Zone

tn

xt xm

User Process 2

ZBP

ZBP

ZBP m:MZBP

EvolverMgrRegistry

ZBP

ZBP

n:N

ZSPMDaemon

User Process 1

PASManager

Figure 7: The System-Wide DRASTIC Run-Time Architecture

cleaneduser data

User 1data

User 3

cleaneduser data

User 1

User 3

User 4

User 2

cleaneduser data

cleaneduser data

User 2

User 4

GEs



Key


data cleaning GU

GEs


GEs

(a) Application−Level View (b) Teaq Process−Level View

GU

Thread Grumps EventEPO

Event queue

GE

Teaq Root Process

Teaq process

Figure 8: An Example GRUMPS-based Distributed Investigation

51

DRASTIC and GRUMPS: The Design and Implementation of...

Documents

Transcript of DRASTIC and GRUMPS: The Design and Implementation of...