DRASTIC and GRUMPS: The Design and Implementation of...
Transcript of DRASTIC and GRUMPS: The Design and Implementation of...
DRASTIC and GRUMPS:
The Design and Implementation of
Two Run-Time Evolution Frameworks
Huw Evans
Department of Computing Science
The University of Glasgow
Glasgow, Scotland, G12 8RZ
Abstract
This paper describes two different approaches to supporting and managing the run-time evolution of distributed
applications. The evolution models and implementations of the DRASTIC and GRUMPS projects are presented and
contrasted. Within the context of related work, the paper argues that there is too little support for software engineers in
constructing distributed applications that may be evolved at run-time. DRASTIC and GRUMPS address this problem
by providing programming language layers that place the support for run-time change at the centre of an application’s
design. The paper goes on to present four core ideas that are generally applicable for the construction of run-time
support layers. In addition, the lessons learned from conducting this research are discussed. The core ideas and
the lessons learned are used to derive ten principles that may be exploited when designing other run-time evolution
support systems. The paper ends with a discussion of the open issues that face those working in the field of distributed
run-time evolution support.
1
1 Introduction
The distributed software systems in use today are highly complex and have become increasingly more important to
our everyday lives. These kinds of system, such as those used to manage telephone networks and hospitals, require
ongoing maintenance and upgrades to help ensure their continued availability. However, it is not viable to bring the
entire system down to perform a change, therefore, for this class of system, it must be possible to change them while
they are executing. In this paper, run-time evolution is defined to be the ability of a distributed application system
to support change to its functionality while it is executing. Unanticipated software evolution is defined to be the
ability to be able to support changes to an executing application that were not anticipated when the application was
either originally designed or first started. The work reported in this paper supports unanticipated software evolution
as post-design and post-execution application updates may be applied to a running system.
This paper describes the lessons learned from the design and implementation of two different evolution-support
software layers that were produced as part of the DRASTIC and GRUMPS projects. These different systems support
the unanticipated run-time evolution of distributed applications. The DRASTIC and GRUMPS projects both take the
view that to successfully design and implement an evolvable distributed system requires that evolution be considered
from the very start of the application’s design. This is achieved by expressing the support for run-time evolution in
terms of an evolution model. The evolution model provides a framework within which the design of an evolvable
application is performed. The model makes available an evolution architecture that defines how a distributed appli-
cation should be implemented in order to evolve it at run-time. The evolution architecture makes available the basic
programming units that may be changed at run-time as well as controlling how and when a software engineer may
apply an update to an executing system. The evolution models are made available as Java-based programming layers
which embody those parts of the models that are required at run-time.
The DRASTIC and GRUMPS systems take different approaches to the kinds of run-time evolution that are to be
supported. DRASTIC takes the view that evolution is performed relatively infrequently, but that when it does occur
it is a major event in the lifetime of the system. To support this, DRASTIC focuses on managing the application of
evolution and controlling its effects on the post-evolution system. By contrast, the GRUMPS system takes an almost
opposite view as evolution is seen as a relatively common operation that should be performed promptly and may occur
many times in quick succession. To support this, the focus in the GRUMPS approach is on separating what is being
evolved from how it is to be changed in order to retain as much run-time flexibility as possible.
The work on DRASTIC and GRUMPS are the first attempts at evolution models and evolution architectures for
2
large, long-lived, distributed systems. As such, both systems are proof of concept and provide an initial investigation
and exploration of the ideas. Performing a large empirical study of theuseof the two approaches was not the focus of
the work. The work has centred on the design and implementation of the two evolution models which have both been
evaluated by the author. The DRASTIC platform has been used by University graduate students in their project work
(see [11]) and GRUMPS by other members of the GRUMPS team [32].
The rest of this paper is organised as follows. Section 2 describes the related work to place this work into context.
Section 3 discusses the core ideas that have been learned during the DRASTIC and GRUMPS projects and section
4 describes the main aspects of the two run-time evolution models. Section 5 then details their respective run-time
architectures which is followed by a discussion of their implementations in section 6. Section 7 describes the con-
clusions of the DRASTIC and GRUMPS research work. Section 8 discusses the open issues for the field of run-time
evolution support. Lastly, section 9 summarises the paper.
2 Related Work
This section discusses alternative approaches to supporting run-time evolution to place the DRASTIC and GRUMPS
work into context. Support for the run-time evolution of systems can be divided into four broad categories: program-
ming language-level support, both research (section 2.1) and commercial (section 2.2); component-level, e.g., J2EE
and .NET (section 2.3); and architectural, such as Darwin and ArchStudio (section 2.4). Section 2.5 concludes this
section by arguing that there is a serious gap in programmer support which exists above the language and component
frameworks, but below the architectural approaches to run-time evolution.
2.1 Research-based Language-level Support
Research into programming language-level support for the run-time evolution of systems has been ongoing for
decades. Dynamically typed programming languages, such as Self [5] and Smalltalk [14], define powerful mech-
anisms that allow their programs to be evolved at run-time. In this class of languages a program can be written that
manipulates itself, and so these types of language provide rich evolution mechanisms [17].
Other language-level research, represented here by the work on Erlang [2], Argus [4] and the recent work of Hicks
on the use of dynamic updates using verifiable native code [16], demonstrates that there are generally applicable
approaches to supporting the run-time evolution of a program. Language-level support typically: defines the unit of
replacement (e.g., modules in Erlang); provides the necessary mechanisms to allow the units to be safely replaced at
3
run-time; and also provides support so that state may be transferred from its old format into its new format. Some
approaches also provide tools (e.g., Argus) so that the run-time evolution of a number of modules may be coordinated
by a human user.
The work discussed above provides useful mechanisms to allow the run-time evolution of a program to be per-
formed. If software systems are to be effectively evolved at run-time, the programming languages and environments
that they are constructed from must make available a basic set of primitives that may be used to express and manage
change. It is advantageous if these are expressed to the programmer as part of the programming model, as is the case
with CLOS (as well as Smalltalk and Self) which defines a powerful behavioural reflection mechanism [19]. Building
the basic change primitives into a language ensures that a programmer can exploit them to construct an evolution
model. The provision of such primitives is a first step towards the design and implementation ofstandardevolu-
tion models and architectures that may be expressed within a particular language. Multiple standards are required
to address the differences that exist within the various programming domains, for example, real time programming,
embedded programming and general-purpose programming. No one approach will fit all these programming models.
However, without a standard approach, programmers are forced to continually reinvent evolution mechanisms,
models and strategies. Such an approach leads to a number of separately developed and incompatible models. Due
to its ubiquity, a standard model is much more powerful, its universal presence may be usefully exploited byall
programmers.
2.2 Commercial Programming Languages
Modern statically-typed commercial programming languages such as Java and C# [24] encourage programmers,
through various concepts such as typing, encapsulation and polymorphism, to write code that should be easier to
maintain and evolve. However, focus is placed on the non run-time issues of reusing program source code and trying
to make it easier to manipulate the codebase of a particular application. Neither of these languages directly address
the issues of run-time evolution by defining an evolution model or architecture.
It is possible to dynamically update both Java and C# programs. In Java, this can be done either by using an
application specific classloader, or by using an API that supports the run-time manipulation of Java bytecode to gain
certain Smalltalk-like meta-programming effects. However, these are ad-hoc approaches that lie outside a standardised
evolution model and run-time architecture.
Statically typed languages tend to have a boundary in the design of the language that firmly separates the compile-
4
time decisions from the run-time decisions, and it is typical that if a decision has been made at compile-time it cannot
be changed later. The designers of such languages have not anticipated the need of programmers to be able to build
systems that can be flexibly manipulated at run-time. However, there is a need for this, and the Java virtual machine
is moving towards support for this, e.g., through the introduction of the Java Platform Debugger Architecture and the
HotSwap functionality [26] available in the Java HotSpot VM, starting from JDK 1.4.
2.3 Component Frameworks
Component frameworks such as J2EE [18], .NET [25] and CORBA [29] support the software engineer in the produc-
tion of enterprise-scale architectures. J2EE and .NET are primarily focussed on the complex problem of providing
large-scale systems that can interoperate with legacy systems and with others that are available across the Internet,
via the Web Services suite of standards [33]. Such architectures are useful because they do support some run-time
evolutionary operations, e.g., components may be replaced during application execution. The use of such architectures
is also advantageous as they can make the production of complex software easier by encouraging a certain style of
programming that is likely to lead to smoother component integration and replacement. However, these component
models do not define a general evolution model and run-time evolution is not considered to be a design priority, so the
problems that apply to programming languages also apply here.
However, component frameworks offer a natural place in the software hierarchy within which to add support for
run-time evolution, preferably building on standard language-level support. The kinds of large, long-lived systems
constructed using J2EE and .NET are likely to benefit from run-time evolution. Such an approach would provide
the industry with a standard for enterprise-level run-time change. By being constructed on top of a set of evolution
standards, separately developed corporate-critical systems could be more successfully integrated with one another as
an evolution in one would be less likely to adversely affect another. Such a standard would also benefit the Web
Services initiative as this community is defining standards for how such enterprise-level systems can inter-operate
across the Web.
2.4 Architectural-Level Support
At a higher-level than component frameworks, some research systems allow software designers to design and ma-
nipulate their systems in terms of its architecture. Systems such as Darwin [23] and ArchStudio [28] abstract away
from language-level and component-level details, preferring to encourage software engineers to think in terms of the
5
major architectural abstractions that their systems are composed from. The architecture of these kinds of system is
typically constructed from a collection of components that are connected together with a number of (possibly typed)
connectors, and data is moved from one component to another, via the connectors. Each component processes the
data in a particular way, before forwarding it to another connector or, for example, writing it to stable storage. To
assist the designer of such a system an architectural description language, together with tool support, is provided that
may be used to describe and perform an update to the topology and functionality of a system at run-time. In these
kinds of system, run-time evolution is usually expressed at the level of the component and connector, e.g., start/stop a
component and connect/disconnect a connector.
ArchStudio also defines a set of rules that govern, among other things, how components may communicate with
one another. Each component has a top and bottom domain: the top domain specifies the set of notifications to which
a component responds, and the set of requests it emits up an architecture; and the bottom domain specifies the set
of notifications that a component emits down an architecture and the set of requests to which it responds. This style
depends on a principle that ensures a component within the architecture is only aware of components above it, i.e., a
component is unaware of the components that are beneath it.
These types of system are a move towards a domain-specific evolution model and architecture in that systems for
which the component and connector architecture is appropriate may be evolved. The work reported in this paper is
conceptually below that of an architecture-oriented approach, but above that of a component framework. An archi-
tectural approach, such as ArchStudio, could make use of the kinds of run-time evolution support provided by the
DRASTIC and GRUMPS systems. In turn, either of these two systems could make use of the underlying evolutionary
facilities that were provided by a component framework that is itself making use of the more primitive language-level
support.
2.5 Discussion
Research results and commercially available solutions to the problem of run-time evolution tend to be focussed at
either the relatively low level of the language or component framework, or at the high architectural level. Dividing
the focus into two groups in this way has led to improvements in support for run-time evolution as the issues may be
separated from one another within the layers. The language and component frameworks provide the mechanisms to
be able to evolve the contents of (typically) a single address space. The architectural approaches, such as Darwin and
Archstudio, focus more on the design and run-time management of a component and connector-oriented distributed
6
system. At this higher level, evolution is performed by replacing individual components and by altering the topology
of the system.
As these two approaches have focussed on the issues that are largely within their own layers, a gap between the
two has emerged. This paper argues that in bridging this gap the advantages of both approaches to run-time evolution
may be exploited. One way to provide such a bridge is to define an evolution model and evolution architecture. At
the lower-level, the model and its architecture exploit and abstract over the evolution mechanisms that are provided
by the underlying programming language and component framework. To provide the bridge to the higher-level, the
evolution architecture makes available an API that embodies the semantics of the evolution model. This API may then
be used to construct a distributed system that can be evolved at run-time, as well as to provide tools to support this.
The evolution model supports the software engineer indesigninga distributed system that may be evolved. The
evolution architecture provides support for the software engineer inperforminga run-time change. In turn, the evo-
lution model and architecture are brought together by the evolution API which supports the software engineer in
constructingan evolvable system.
Systems such as Darwin and ArchStudio have, inevitably, addressed some of these issues. However, the approach
advocated here suggests that making these facilities available as part of an explicit evolution model that is available at
run-time will lead to distributed systems that have been designed and implemented with their evolution in mind from
the beginning. In the same way that software engineers have to consider the failure model of a distributed system, with
the approach described here, they are also required to consider the evolution model as well. The software engineer
is made aware of the need to program for evolution and by using the evolution architecture they build in support for
the run-time evolution of their system, making it an explicit part of its design and implementation. The evolution
architecture also takes away from the software engineer some of the complexity of performing and managing the
evolution at run-time. Such an evolution model and implementation is made available in the DRASTIC and GRUMPS
systems.
3 Core Ideas
This section describes the four core ideas that are the main results of the work summarised in this paper. The main
idea is the need for an evolution model (section 3.1). The requirement to separate what is being evolved from how it
is evolved is discussed in section 3.2. Section 3.3 then motivates why programmers should program for evolution, and
section 3.4 explains the reasons behind wanting to make as many decisions as possible at run-time.
7
The ideas introduced in this section are summarised in place, turning them into ten principles for supporting
run-time evolution. These principles are returned to throughout the paper to reinforce the contribution that has been
made.
3.1 The Need for Evolution Models and Evolution Architectures
An evolution model defines for a programmer that aspect of the programming model which specifies how a software
system may be designed and implemented for run-time change, as well as how the change may be performed and
supported during system execution. An evolution model can be divided into two separate, but equally important, parts:
first, is the evolution model that focuses on the design of the distributed system and the implications of changing it at
run-time; the second is the evolution architecture that, together with the evolution API, handles the details of how a
run-time evolution is performed1.
A typed programming model ensures the programmer does not try to perform an operation that is deemed unwise
or unsafe by those that have defined the semantics of the type system. Within the context of run-time evolution, a type
system can be seen as a means of ensuring call-level compatibility between separately defined and developed parts of
a system. Typing provides a foundation on top of which an evolution framework can be built.
The design- and implementation-time evolution model should support the software engineer in understanding
those aspects of their problem that require run-time change. Among other things, an evolution model should define
what can and cannot be evolved, what the unit of evolution is, and what the main programming abstractions are that
a programmer may use to build an evolvable system. By following this approach, the software engineer builds an
application system that contains support for its run-time change. The application not only contains the application
system, but also the evolution system, which is capable of evolving the application system at run-time. This brings us
to the first principle of support for run-time evolution.
An evolution model is crucial if run-time evolution is to be effectively used and performed at run-time.
Principle 1: Evolution Models are Crucial.
Without such a model, it is possible that a design or implementation decision that may have an impact on how the
system could be updated at run-time may not be considered by the software engineer. An evolution model helps the
software engineer to consider other system issues within the context and constraints of run-time evolution.
1The phrase “evolution model” is used in this paper to refer to the model, the architecture and the API.
8
In addition to the above, no one model is suitable in all circumstances. There are many different types of en-
vironment that programmed systems are deployed into, such as embedded systems and general-purpose business
applications. No one standard evolution model and evolution architecture is suitable across such a range of systems
that have radically different requirements. Rather, a handful of standards will be required, where each standard ad-
dresses the issues pertinent to run-time evolution within the particular environment. As examples of this focus on
differing environments, the DRASTIC and GRUMPS systems address the issues of run-time evolution within their
respective types of distributed system.
An evolution model needs to be tailored to the environment in which it is being used.
Principle 2: A Number of Evolution Models are Required.
3.1.1 The Evolution Architecture
The run-time realisation of the evolution model is its evolution architecture and it must support the programmer in
effectively updating an executing application. This means providing support to manage and safely update the running
system, helping it to correctly move between evolution states without generating errors at the application-level. The
evolution architecture may also provide some support to automate certain operations to remove the need for human
input.
As the evolution architecture is responsible for effecting the changes to the executing program, the choice of
implementation language is important. When designing and implementing an evolution architecture a free choice of
implementation language may not be possible; one may be imposed as other parts of the system may already use
that language. The implementation language will have a big impact on how an evolution may be performed. This is
because certain languages are more flexible in allowing changes to be performed to the components the program is
constructed from, e.g., more readily allowing for the replacement of code or the run-time reinterpretation of it.
The evolution model should reflect and be compatible with the other models used within the system, both the
computational models, such as that for distributed computation, and the non-functional models, such as the business
model that the computer system helps to support. Evolving an executing system will affect some aspects of it, such
as its availability, which in turn may affect the business. Therefore, given this level of importance, and the desire to
be able to change the application system at run-time, the evolution model should become a central part of the overall
design and implementation of that application.
9
A software engineer should consider run-time evolution to be an
integral part of an application’s design.
Principle 3: Support for Run-time Evolution is a Central Issue.
Changing the behaviour of a program at run-time involves altering its semantics. In the general case, only a
human being has the necessary information and knowledge to know what the implications of a particular change may
be. Therefore, it is not possible tofully automate the run-time change of a system. Even though full automation is
not possible, it is possible to support the human in their design and implementation of the system and its subsequent
run-time change. Such support should make the task of changing a system at run-time easier and thus less error-prone.
The run-time evolution of a program is a semantic issue
which, in the general case, cannot be fully automated.
Principle 4: Run-time Evolution is a Semantic Issue.
In designing the evolution model and its architecture it can be advantageous to reuse the concepts that are defined
by the underlying programming language and component frameworks, and make them available at the evolution model
and architectural level. This increases the software engineer’s familiarity with the model and its implementation and
decreases the number of concepts they have to simultaneously manage and tradeoff against each other. This reduces
the complexity of the evolution model which is a good general design principle.
The design of the evolution model should be sympathetic to the
underlying component and programming models.
Principle 5: Exploit Design Familiarity.
3.2 Separating What is Evolved from How it is to be Evolved
When designing and implementing a system to support run-time evolution it is important to separatewhat is being
evolved fromhow that evolution will be performed. What is being evolved is typically an executing system. It is
possible to evolve such a system in one of a number of ways. By separating these two issues, the designers and users
of the framework are given more flexibility in choosing the most appropriate approach to the evolution of their system.
The use of separation leads to an evolution model where change isappliedto a system, and to an approach where a
10
single evolution model may be applied to the run-time update of a number of different application systems. Mixing
them leads to a single, monolithic system that is harder to manipulate as the two parts become intertwined.
An evolution model can help the software engineer separate what is being evolved
from how it should be evolved.
Principle 6: Separation of Concerns is Important.
By separating the system being evolved from the system that will help perform the change, and by encapsulating
both of these into different models, those updating the system are given support in handling the complexity of the
change. The separation of concerns increases the ease with which the individual pieces may be inspected and under-
stood by those responsible for its change. This is because one element of the system is not unnecessarily confused
with another. In addition, not only is it possible to change the running application system by applying the evolution
system to it, but it is also possible to change the evolution systemitself. As the evolution system is just another system,
it is possible to define a meta-evolution system that is capable of changing the evolution system. In the same way that
support is required to be able to change running application systems, support to be able to change the evolution system
is also necessary. If the running application system is changed, the evolution system that was used to perform that
change may also need to be adapted to ensure the newly changed application system can be changed in the most ap-
propriate manner in the future. Therefore, there is a feedback cycle between the application system and the evolution
system. A change in one leads to the need to change the other, which repeats itself.
The designer of a system cannot foresee all future uses of it and so an update should be applied in a way that
is most appropriate to the current as well as future use, state and architecture of the system. This applies equally
to the application system as well as the evolution system. A change performed to the application system may also
have an effect on a subsequent use of the run-time evolution system, and vice versa. This is because the application
system and the evolution system are both present at run-time and, even though the two systems are separate, they are
inevitably inter-related as they must interact in order to manage an update. A change to one system may change the
other. Therefore, it is advantageous if the evolution model is flexible enough to support evolved uses of itself over
time, in order to help support change to the on-going evolution of the application system.
It is advantageous if the evolution model and architecture can support evolution of itself.
Principle 7: The Evolution Model should be Evolvable.
11
3.3 Programming for Evolution
Application programmers must program for evolution. The approach to evolution advocated here involves an evo-
lutionary process being applied to an executing system. This implies that normal application computation could be
interrupted during the evolution process. The executing application needs to be able to tolerate changes being per-
formed to it. To support this the run-time evolution architecture should present an API to the programmer so that a
meaningful dialogue between the application program and the evolution support system is possible.
When implementing systems, programmers make assumptions about the environment they are dealing with in
order to simplify their solution. Evolving a system at run-time can cause such assumptions to break which can result
in the failure of a previously working application. As the programmer is working within the context of an evolution
model and evolution architecture, they must be aware that their program’s behaviour may change due to an evolution.
In the same way that distributed programs can fail in ways that are semantically different to purely centralized solu-
tions, programs executing within the context of an evolution model can fail in ways that are semantically different to
programs that are executing outside such a context. As a result of this, programmers must be aware that a system can
behave differently due to the evolutionary context it is executing within and should, therefore, dedicate part of their
program to handling the effects of its possible evolution.
Those that use evolution models and their APIs need to program for evolution.
Principle 8: Program for Evolution.
In the general case, it is not possible, nor is it always advisable, to hide from the programmer or the executing
system that an evolution is taking place. Hiding this fact is generally not possible because the system being changed
is the same as the executing system and so it may be able to detect a change to itself. In addition, trying to hide such
a change may be bad for the long term maintenance of the program. Trying to hide the change may require a solution
that is artificial with respect to the rest of the system’s design. In the future, a new change may be required, however,
it may not be easy to apply the new change due to the artificiality of the previous change. The software engineer
could be forced to perform the new change within a less than ideal context, one that has been brought about due to a
misguided (in the long term) desire to encapsulate a change from the rest of the system.
As such kinds of change cannot be hidden from the programmer, the rest of the change should be expressed to
them through the run-time evolution architecture. In this way, the software engineer is assisted in performing the
change by giving them access to the current run-time state of the system.
12
As much design-time information as possible should be retained
and made available to the run-time system.
Principle 9: Information Loss should be Minimised.
The programmer should be able to ascertain the current state of the run-time architecture of both the application
system and the evolution architecture. If no information is lost when moving from the design stage through to the
execution of the system, then the executing applicationis its design and the executing evolution architectureis the
evolution model. Having access to this kind of information at run-time can be invaluable when performing a run-time
evolution. A proposed application evolution may break a design invariant, such as no cycles being allowed between
run-time components. Having this detailed information available, at several levels of abstractions, at run-time can
ensure the system may be tested and the evolution architecture can alert those performing the evolution that there may
be problems with certain kinds of change.
3.4 Delaying Decisions until Run-time
The run-time evolution of a system requires as many decisions about that evolution as are appropriate to be performed
at run-time. This is necessary for two reasons. Firstly, certain information relevant to the evolution may only be
available at run-time. For example, it may not be possible to evolve a system when certain important pieces of code
are executing. Being able to either detect when such code is executing or will be executing in the future will allow an
evolution to be scheduled appropriately. Secondly, the most accurate description of a running system is the running
system itself. It would be possible to retain a static view of the current system and update it after an evolution has been
performed. However, it is possible that the static description will become out of date with respect to the executing
system. Rather than do this, the evolution architecture can track the evolutions that have been applied to the running
system by inspecting the system state that is pertinent to evolution before and after the change has been applied. To be
able to do this, both the application and evolution architecture need to be supported in making decisions at run-time.
In addition, run-time evolution requires the evolution architecture to be very flexible. If a human user needs to
update the functionality of an executing system, the flexibility required to do this demands that as many decisions as
appropriate can be made and performed at run-time. Information from the run-time system needs to be accessed at
run-time so that the most appropriate decision can be made. However, this level of access can have consequences for
the enforcement of decisions that have been made before system run-time (section 7).
13
Run-time evolution demands that as many decisions as are
appropriate can be made at run-time.
Principle 10: Run-time Evolution requires Run-time Decisions to be Supported.
4 Experiments
The results presented in this paper are based on the outcome of two major experiments conducted as part of the
author’s PhD work in the area of support for the run-time evolution of distributed systems [7]. There are many
different kinds of distributed system possible and many different approaches to the support of run-time evolution
within such an environment. The DRASTIC and GRUMPS experiments are two data points within the space of
all possible approaches to the run-time evolution of distributed systems. This work focuses on supporting run-time
evolution when the application system has been designed and built with this in mind. The equally important problem
of adding support for run-time change to a pre-existing application falls outside the scope of the DRASTIC and
GRUMPS work and so is not discussed in this paper.
Sections 4.1 and 4.2 introduce the DRASTIC and GRUMPS evolution models, before discussing in section 4.3 the
kinds of evolution each approach supports, how they make use of the four core ideas introduced above and the main
differences between the two approaches.
4.1 DRASTIC
The DRASTIC project [9, 10, 11] investigated support for the run-time evolution of a large, long-lived, persistent
system. The kind of distributed application system being considered would be, for example, one used to manage the
day-to-day business of a medium sized company that may have a purchasing department and an accounts department
that would need to interact. The software managing this company is considered too important to be brought down in its
entirety, however, it has been decided that identified parts of the software can be made unavailable for periods of time.
To support this, the DRASTIC evolution model and application architecture divide the distributed application across a
number of disjoint, semi-autonomous regions called zones. The organisation of the zones follows the organisation of
the business, thus the DRASTIC project assumes there would be a zone for each of the above departments. By doing
this, the DRASTIC evolution model is tailored to the environment in which it is to be used (see principle 2).
A zone contains the software necessary to support that part of the business’ functionality. A zone is a collection of
executing processes and typed object-oriented language-level objects that may communicate with other objects that
14
are contained either within the same zone or within other zones. The DRASTIC communication model, therefore,
supports inter-zone method invocations.
To evolve the contents of a zone, a language type would be updated, perhaps changing its public interface, and
possibly adding or removing functionality from it. During a zone evolution, as a simplifying assumption, all instances
of a particular type are updated.
Clemens Szyperski in [31] defines a component as a set of normally simultaneously deployed atomic components
where an atomic component is a code module and a set of non-compiler-generated resources, e.g., configuration files.
A DRASTIC zone would contain many such Szyperski components as well as the inter-zone contracts, executing
change absorbers and type transformers, and the tools to manipulate the zone at run-time. Therefore, a DRASTIC
zone is a much larger, more complex and more dynamic entity than a Szyperski component. In this sense, a DRASTIC
zone has more in common with the Megaprogramming modules of Wiederhold, Wegner and Ceri, as defined in [34].
Zones are only semi-autonomous as their code will interact with code in other zones to perform the functionality
of the system. Interaction between objects in different zones is controlled on a pair-wise basis by a zone contract.
A single zone may hold several contracts with a number of other zones. A contract defines the types of objects that
may be exchanged between the two zones and how inter-zone method invocations should be handled. Code in one
zone may be expressed in terms of a pre-evolution type, however, the call may actually be handled by an instance of
a post-evolution type. Programmer supplied code is then installed at the zone boundary inside objects called change
absorbers. A change absorber translates the inter-zone call between the two differently evolved objects. An evolution
of a single zone involves temporarily suspending activity between the zone being evolved and the other zones that it
holds contracts with. Therefore, a DRASTIC-based distributed application has to be designed and implemented by the
software engineer to tolerate part of that application not being present when that part is evolved. The processes within
that zone are terminated and any persistent state is updated to make use of the evolved programming types. Processes
within the zone are then restarted and inter-zone method invocations are redirected by the DRASTIC run-time system
via the newly installed change absorbers.
Before the evolution has been performed, code in an external zone may be successfully calling code within the
zone to be evolved. After the evolution has taken place, the call may no longer be successful. This is because a change
to the pair-wise contract may have made that call invalid. The software engineer that wrote the calling code needs to
be aware that this kind of failure is possible so that they can program their system to handle this possibility. This is an
example application of principle 8, the need to program for evolution. As run-time evolution is a semantic issue which
15
cannot be fully automated in the general case (principle 4), the software engineer is forced to program for evolution.
Principle 8 is, therefore, a consequence of the existence of principle 4.
Zone, contracts and change absorbers are concepts that exist at system design-time that become application-level
run-time objects. When a software engineer interacts with a DRASTIC-based application they are given access to
objects that represent these design-time abstractions. In this way, DRASTIC supports principles 5 and 9. The concepts
at both design- and run-time are the same, reducing the complexity of the system, and the design-time information
about these concepts is made available during system execution via their run-time objects. This allows the run-time
system to inspect design-time information which may be used to check whether certain evolutionary changes should
be performed on the system. Thus, the DRASTIC system embodies principle 4, by supporting a human being in
changing a run-time system.
By making available run-time objects that model their design-time counterparts, the DRASTIC software engineer
is given support in considering run-time evolution to be an integral part of an application’s design (principle 3). The
design of an application is partly composed from the concepts that are used to model its change and to also perform
that change during application execution.
Figure 1 shows two zones (zonePurchasing and zoneAccounts) with one process in each zone. Process 2
in zoneAccounts contains an object calledm2 that is of typeM. This object is referred to from process 1 in zone
Purchasing. However, process 1 believes it holds a reference to an object calledn of typeN. At some point in the
past, process 1 did hold such a reference to an object of typeN in zoneAccounts. However, zoneAccounts has
subsequently evolved itsN type into typeM. In order to encapsulate the effects of this change within zoneAccounts,
a change absorber has been made available on the inter-zone reference. This change absorber exports theN type so
that process 1 may invoke along the chain. The invocation is translated into one on the evolved typeM and the method
call is directed through the contract-containing process in zoneAccounts.
The design- and run-time manipulation of the kind of system described above is a complex endeavour. The
DRASTIC evolution model handles some of this complexity for the software engineer by embodying a particular
approach to run-time change within its model and run-time architecture. Certain decisions on how to perform a run-
time change have been made by the provider of the evolution model and so the software engineer does not have to
be concerned with these. In this way, the job of the software engineer is made easier, so that they may focus on their
application, not on the details of how to change its semantics at run-time. DRASTIC, therefore, supports principles
1 and 6: an evolution model has been provided to support run-time application evolution; and a clear separation has
2It is calledm in the sense that this is the name of the reference in the Java source code.
16
Zone Purchasing Zone Accounts
M
User Process 2
m:M
User Process 1
n:NContract Contract
changeabsorber
N
Figure 1: An Inter-zone method Invocation via the Zone Contract
been made between what is being evolved and how it is to be evolved.
The DRASTIC evolution model and architecture supports a weak form of principle 7 in that an evolution may be
modified by a subsequent one by deploying new change absorbers. The formulation of this principle and its advantages
only became apparent when designing the GRUMPS approach to run-time evolution.
4.2 GRUMPS
The GRUMPS project [3, 6, 8, 12, 15] is developing techniques and software to collect and manage large collections
of user actions from distributed investigations. An investigator typically has a hypothesis they wish to investigate
and, together with a software engineer, a distributed system would be deployed to test that hypothesis. During the
collection of the user actions, new questions may come to light that the investigator wants to explore by changing the
implementation of the currently executing investigation. To support this, the GRUMPS program architecture allows
the functionality of a system to be changed at run-time.
The GRUMPS approach to constructing a distributed investigation divides the application code across a number
of distributed, inter-connected GRUMPS Unit (GU) objects. GU objects communicate with one another using uni-
directional event channels. GRUMPS events are sent through event channels and the GRUMPS event encapsulates
application data and executable code. A GU object contains an event processing object (EPO) which receives incoming
events, processing them in some way before (optionally) sending them to another GU, or possibly long-term storage.
Evolution is performed in GRUMPS by replacing the EPO objects inside GU objects. As an EPO object may be
replaced, the way an event at a particular GU object is processed may be changed. GU objects are placed inside
GUContainer objects.
A GUContainer object acts as a place-holder for a number of related GU objects, giving them a well-known
location and allowing the programmer to manipulate them as a single entity. GUContainers make available a special
17
event channel called a control event channel. A control event is a special kind of GRUMPS event that carries code
and a payload with it. A control event is passed to a (typically remote) GUContainer and the GRUMPS run-time
system passes a reference to the container to the control event. Code in the control event is then executed which will
call the public methods on the GUContainer. This mechanism is used to install GU objects inside a GUContainer and
to update EPO objects within a GU. To replace an EPO within a GU, a control event is sent to the GU’s containing
GUContainer. The control event code then retrieves from the container a reference to the GU object of interest. This
GU then has its current EPO removed and the new EPO installed3.
GRUMPS processes form a spanning-tree and the interaction of a process with the tree is managed for the pro-
grammer by the Teaq4 subsystem. Objects (typically GUContainer and GU objects) are located within the distributed
system by sending OQL-like queries across the tree [6, 8]. Within a single process, objects that are to be found by
incoming queries are registered with the GRUMPS run-time system. A newly received query will be run against the
current collection of registered objects. If a match is found, either a copy of the matching object is returned to the
query initiator, or a proxy object is created and that is returned to the query originator from where it may interact with
the (remote) matching object. If a query matches a particular GUContainer, the result object that is passed back is a
reference to the control channel which may be used to send control events to the container. The approach to program-
ming with queries is referred to as query-oriented programming (QOP5) [12] which is one component of Teaq. The
distributed group of objects is collectively known as a GRUMPSNet.
Figure 2 shows a simple GRUMPSNet that consists of two Teaq processes with a single GUContainer each, the
left container holds two GU objects and the right container holds a single GU. Data is collected from two instrumented
computers (see section 5.2) which is sent to the left-most GU objects. These objects translate the data into GRUMPS
events and these events are sent to the third GU which cleans the data before writing it to stable storage. The two
GUContainers each have a control channel which may be used to update the contents of the container. A GUContainer
is discovered at run-time by attaching a process to the GRUMPSNet and propagating a QOP query across the process
graph. The result of such a query is typically a reference to the discovered GUContainer’s control channel.
A GRUMPSNet can be seen as an instance of the pipe and filter design pattern where the filtering is performed by
the GU objects. The design of GRUMPS deliberately exposes the complexity of the GU objects to the programmer,
3GUContainers and GU objects may also be manipulated, e.g., removing a GUContainer from a process or a GU object from its container.4The Teaq subsystem manages the processes running within the GRUMPS distributed system, placing them into the spanning-tree and recon-
necting them should their parent process crash or be taken away. Teaq is an acronym that stands for trees, evolution and queries and is pronounced
the same as ‘teak’.5c.f. Jxta search [30].
18
User 1
cleaneduser data
User 2
Key
Inter−GU reference
Teaq parent reference
ThreadEvent queue
ChannelControl
Channel
GEGE
GE
Control
EPO
GE Grumps EventGUContainer GU
Teaq Root Process
Figure 2: An Example GRUMPSNet
rather than abstracting over them as is done, for example, by [1], although such an abstraction could be provided
on top of the current GRUMPS API. The focus in the GRUMPS work is on the kinds ofprimitive support that are
necessary to support rapid run-time evolution. The approach in the GRUMPS work to providing rapid update is to
give the software engineer access to the primitive run-time evolution support mechanisms. By taking away the layers
of abstraction that are present in other evolution support mechanisms, such as DRASTIC, the software engineer can
affect an evolution upon a running system more rapidly. The rapidity of the update refers to the time it takes to affect
the running system, not the amount of time it takes to plan and to test the change to the system.
4.3 Discussion of Approaches to Evolution
4.3.1 DRASTIC
Evolution in DRASTIC is a planned, complex and heavily coordinated activity that is supported by a team of soft-
ware engineers who are responsible for the changes to the zones involved. An evolution will make a portion of the
application unavailable for a period of time, and ideally that part of the application will be contained within one zone.
Therefore, evolution in a DRASTIC-based system is a relatively rare event, perhaps only being performed once or
twice a year. Most of the time, a DRASTIC-based system will provide its application functionality, only being inter-
rupted for scheduled evolutions. Therefore, the DRASTIC evolution model can be seen as phased, long periods of
application activity are interrupted by major phases of application evolution.
In the DRASTIC evolution model, code change is encapsulated inside a zone. Together with that zone’s contracts
and its change absorbers, code elsewhere in the distributed system is not aware that a change has been performed.
19
This is how DRASTIC separates what is being evolved (the partial system contained in a zone) from how it is evolved,
i.e., the organisation of the system via zones, contracts and change absorbers. However, a DRASTIC programmer
must still program for evolution. After an evolution inside a zone, a method invocation into that zone may fail. This
is because the contract between the two zones may have been updated and a type previously used in the method
invocation may no longer be allowed to cross the zone boundary. An exception will be raised at the calling side and
so the programmer that has written cross-boundary calls must be aware that this is a possibility and be prepared to
handle this case. The DRASTIC evolution architecture allows the programmer to make some decisions at run-time.
For example, the programmer-supplied code that performs the inter-zone object translations contained in the change
absorber will be executed as part of the post-evolution execution of the application. This code can make run-time
decisions to control how the inter-zone call is translated.
At times, it may not be possible to completely encapsulate a change within a single zone. The change proposed to
the system may be sufficiently wide-ranging that performing an update to more than one zone is the most appropriate
course of action. In this case, the software engineers responsible for the affected zones would have to mutually
agree what system-wide changes were to be performed. This may require a change to a number of contracts and
possibly some change absorbers to allow any modified inter-zone interactions to take place. The DRASTIC approach
to run-time evolution acknowledges that such an evolution is necessary and the model and implementation support it.
However, DRASTIC assumes that it is a relatively rare event. For more discussion on this issue, see [7].
4.3.2 GRUMPS
In comparison to DRASTIC, run-time evolution in GRUMPS is a more ad-hoc and rapidly occurring activity. A user
of a GRUMPS-based system may turn-around an evolution of their distributed application quite promptly. A query
is issued to find a remote GUContainer of interest. The returned control channel is then used to send a control event
to the remote GUContainer which would replace a particular GU’s EPO object by executing the event code in the
context of the remote container. The time it takes to perform these operations is much shorter than the time it takes
to carry out the evolution of a DRASTIC zone. A GRUMPS evolution is a light-weight activity, typically replacing
a single object within a process. In this way, there is no attempt to hide the effects of an evolution from other parts
of the system and there is no need to impose the level of planning and coordination that are required in DRASTIC.
Evolution in GRUMPS is a relatively common event that would be performed several times a day. As a result, there is
no notion of evolution phases in the GRUMPS evolution model; multiple parts of a GRUMPS distributed system may
20
be undergoing some form of change. However, in both systems, it is hard to see how such systems could be built and
subsequently evolved without the application of an evolution model (principle 1). In addition, the differences between
the two models highlight the need for evolution models that are tailored to the specific environment the application
will be executing in (principle 2).
The GRUMPS evolution model divides what is being evolved (the GUContainers, GU and EPO objects) from how
it is evolved via the application of control events (supporting principle 6). An executing GRUMPS-based, distributed
application is evolved by applying an open-ended collection of control events to it. This increases the flexibility of
the system: objects that were only designed and implemented after the system has been started can be installed at
run-time and given a location within the system inside a GUContainer; and new control events can be sent into the
system to update it in a way most appropriate to its current contents. The running GRUMPS system can accommodate
new control events which themselves can embody completely new decisions in their code. This allows the GRUMPS
programmer to make as many decisions as are appropriate at run-time (principle 10).
In this way, the GRUMPS evolution model itself may be evolved, adapting to the ever changing needs of the
distributed system (principle 7). Programmers must program for evolution in this model (principle 8) as an evolution
may remove a part of the system. For example, due to a run-time change, a GU object may be removed and the
object that was previously successfully sending it GRUMPS events will have to find a replacement GU object. This
is done by executing a query to find another compatible GU. GRUMPS supports the programmer in making as many
decisions as are appropriate at run-time by providing control events and the query approach to programming. A control
event can contain the code necessary to decide whether it should apply itself to a particular GUContainer. Similarly,
a programmer may write a query that is based on information only available at run-time. The GRUMPS evolution
architecture makes available a collection of default control events to automate common activities in the system, such
as the installation and removal of GU objects. This reduces the number of run-time decisions that the programmer has
to make while simultaneously providing them with the necessary power to change the system in application-specific
ways by allowing them to author their own control events.
4.3.3 Both Systems
Within DRASTIC and GRUMPS systems, at times, it may be most convenient to shutdown part of the system to
perform an evolution to it. If a new version of the underlying evolution support architecture was to be deployed, for
example, it could make sense to terminate the entire system. Neither the DRASTIC nor GRUMPS approaches to
21
run-time evolution address the hard problem of being able to evolve the implementation of the evolution support layer.
This work addresses the problem of how to evolve the system that makes use of this layer. To address it, some form
of version control may be required. This would allow the distributed evolution support layer to track the version of
the various parts of itself that were being used at any one time. Such an approach is similar to that used in database
schema versioning [27].
It is assumed in both the DRASTIC and GRUMPS systems that there is a single system to be evolved and those
involved in the change understand and agree how the change will be performed. The DRASTIC and GRUMPS
systems are not targeted to the form of evolution where there are a number of complete systems which their users
want to update in several different ways, using third party components that are sourced from a number of different
vendors. The DRASTIC and GRUMPS approaches assume what is being changed has been written for, and tested
within, the single evolution framework in which it will be deployed.
5 Run-time Architecture
This section describes the DRASTIC and GRUMPS run-time support architectures and how they are used to perform
an evolution within the context of the evolution models described above. For detailed examples of how the DRASTIC
architecture is used, see [9, 11], and for GRUMPS, see [8, 12].
5.1 DRASTIC Run-Time Architecture
The DRASTIC run-time architecture implements the zones, contracts and change absorbers introduced in section 4.1.
These concepts exist at design-time and they become part of the run-time evolution infrastructure available to the
DRASTIC programmer (supporting principle 9). Figure 3 shows a snap-shot of a post-evolution executing application
that has been divided across three zones (Purchasing, Personnel andAccounts).
The Registry, EvolverMgr and thePASManager are processes that are used by the software engineer when
they want to evolve the application. TheRegistry is the lowest-level name server used in the DRASTIC system
and it is used by all the other non-application-level processes to contact each other. The description for all of the
zones is contained in the persistent application system manager (PASManager). This process contains information
on the application’s contracts as well as references into technology within each zone called the zone specific process
manager (ZspmDaemon) which manages the processes within a single zone6. TheEvolverMgr process is used by
6Only oneZspmDaemon in zoneAccounts is shown on the diagram to keep it clear. In a real system, each zone would contain such a daemon.
22
Personnel Zone
PurchasingZone
Accounts Zone
tn
xt xm
User Process 2
ZBP
ZBP
ZBP m:MZBP
EvolverMgrRegistry
ZBP
ZBP
n:N
ZSPMDaemon
User Process 1
PASManager
Figure 3: The System-Wide DRASTIC Run-Time Architecture
the software engineer when they want to evolve a zone. This process holds the current set of contracts that are being
edited by the software engineer responsible for the zone being changed.
When an evolution is to be performed, the software engineer responsible for the zone to be changed first of all
has to ascertain what effect the change will have on the rest of the zone. This requires them to calculate the effect on
the current set of contracts that are held with other zones. The DRASTIC system assumes that the software engineer
has performed this, that they have written the new contracts and that they have provided the necessary set of change
absorbers. Once this has been performed, the software engineer for the zone being evolved uses thePASManager to
inform theZspmDaemon in the zone being suspended that it should start an evolution.
5.1.1 Performing an Evolution
Assume that the evolution is to user process 2 in zoneAccounts. ThePASManager informs theZspmDaemon
in zoneAccounts that an evolution will take place. This information is forwarded to the two processes that are re-
sponsible for enforcing the currently defined pair-wise contact, theZoneBoundaryProcess (ZBP). These processes
temporarily suspend new incoming invocations (e.g., into zoneAccounts) and they wait for currently ongoing execu-
tions to finish. Once this has happened the zone boundary is considered to be frozen and theZspmDaemon informs
all processes that they should terminate. This causes each process to execute programmer supplied code that will ter-
minate the process promptly in a consistent fashion. To evolve the contents of the zone, any persistent stores that have
23
been used by the processes are processed to translate instances from the old type to become instances of a new type.
While the processes are being evolved, the software engineer, via thePASManager can cause the new collection of
change absorbers to be installed within the appropriateZBP processes. Which change absorber is installed on which
inter-zone reference chain is driven by information provided by the software engineer in the contract. Once this has
been done and the processes have been updated, this concludes the evolution of the zone. The updated processes, and
any new ones, are then restarted and the software engineer uses thePASManager to inform theZBP processes (via
theZspmDaemon) to allow the resumption of the inter-zone method invocations.
5.1.2 Using the Post-Evolution System
After an evolution has been performed an inter-zone reference exists between the two user processes in zonesPur-
chasing andAccounts. This reference travels through the pair ofZBP processes. TheZBP contains the contract
and the collection of change absorbers that are currently in force between the two zones. In zonePurchasing, in user
process 1, is a reference that leads to an object in zoneAccounts. The reference in user process 1 is calledn and this
process believes the reference leads to an object of typeN. At the zonePurchasing ZBP the invocation is translated
from N to T. This is because at some point in the past the object being called on to was of typeT. However, the call is
actually made on to a change absorber that makes available theT interface at theZBP for Accounts zone. Here the
inter-zone reference passes through two changes absorbers which translate the method invocation from an invocation
on aT to one onX and then onto the actual type, typeM. The last change absorber then calls on to the actual object
in user process 2. Any results from the method invocation are returned back along the same reference with opposite
translations taking place.
5.1.3 Discussion
The DRASTIC run-time architecture supports the evolution model described in section 4.1. This has a number of
advantages. As the same set of concepts are used to design the system as are used to manage it at run-time, there is less
burden placed on the software engineer. As the concepts of encapsulation and the use of the translation descriptions
allow the software engineers to plan and to manage the design and ongoing evolution of their system, it makes sense to
provide implementations of these concepts at run-time. In order to evolve a zone it is convenient to be able to suspend
invocations at the zone boundary. Having the concept of the zone exist at run-time allows this to be easily provided
for. It also gives the implementer of the evolution architecture a process to indirect all inter-zone references through,
thus making the management of this part of the implementation easier.
24
As evolution involves the change to the semantics of the distributed system, it is not possible to completely auto-
mate such an evolution. The DRASTIC approach to evolution acknowledges this and places the software engineer at
the centre of the evolution process. They are provided with the necessary tools and concepts to be able to evolve a
running, distributed system and the engineer is expected to manage the process of evolution. This is because evolving
a DRASTIC-based system is a relatively rare, but major, undertaking, requiring coordination between a group of peo-
ple responsible for the other zones that may be affected due to a system change. By using the tools and concepts, the
programmer engages in programming for evolution (supporting principle 8). This should make changing the system
easier as forethought has gone into it being changed and the tools can reduce the burden on the programmer when
actually updating the system.
5.2 GRUMPS Run-Time Architecture
The GRUMPS run-time architecture implements the concepts introduced in section 4.2. Following the design princi-
ples established within the DRASTIC experiment, the evolution concepts within GRUMPS exist at design-time and
they become part of the run-time system (principle 9). The evolution of a GRUMPS program is performed by writing
a program that creates control events which then sends them to deployed objects that have been found by running
QOP queries. Figure 4 shows two views of an executing GRUMPS-based distributed investigation.
cleaneduser data
User 1data
User 3
cleaneduser data
User 1
User 3
User 4
User 2
cleaneduser data
cleaneduser data
User 2
User 4
GEs
Teaq parent reference
collection GUuser data
Key
collection GUuser data
data cleaning GU
GEs
Inter−GU reference
GEs
(a) Application−Level View (b) Teaq Process−Level View
GU
Thread Grumps EventEPO
Event queue
GE
Teaq Root Process
Teaq process
Figure 4: An Example GRUMPS-based Distributed Investigation
25
Part (a) of figure 4 is the application or user-level view of the application. It consists of four workstations that have
been instrumented with GRUMPS technology [3] to collect mouse click, window focus and keyboard events. This raw
data is sent to a user data collection GU which converts the data into a timestamped GRUMPS event which is passed
to a data cleaning GU. This GU ensures the data is in a suitable form for writing to the cleaned-data database, from
where it will be analysed at a later stage. Cleaning data at run-time may involve the addition of a sequence number to
an event. This number can be used to more efficiently find the next event within a relational database table, rather than
having to write a query that finds such a row based on a calculation to retrieve the next highest event timestamp (see
[32] for more detail on this). Part (b) shows the equivalent application from the Teaq processes point of view. The six
GUs7 are divided across four processes which have been formed into a single tree that is rooted at the process at the
top right-hand corner.
5.2.1 Performing an Evolution
Assume that the event processing object in the data cleaning GU for users one and two is to be replaced. To perform
an update to this GU’s EPO object, code would be written that initiated a query to find the containing GUContainer.
Such code could be part of a generalised object-finding tool which would be another process attached to the tree. The
query would be expressed in terms of finding the GUContainer that contained the GU object of interest, to distinguish
it from the other GU objects within the system. The query is propagated across the tree by the Teaq run-time system
and is executed once inside each process against the collection of registered objects. A match will be found in the
Teaq root process and a control event channel object will be sent back to the query initiator. The tool user could then
send a default EPOUpdate object to replace the EPO in the GU with a new one, or they could use their own kind of
control event to perform more application-specific EPO processing.
5.2.2 Using the Post-Evolution System
In the example, the post-evolution system is structurally the same as the pre-evolution system, but the system’s func-
tionality has been changed as a new EPO has been installed which processes incoming events in a new way. The
structure of the system could be changed during an evolution as a control event can cause the topology of the system
to be altered.
In contrast to the DRASTIC approach to evolution, the GRUMPS approach has no requirement to stop any portion
of the ongoing investigation. Therefore, there is no real notion of using the system after an evolution has been
7This diagram omits the GUContainers for clarity.
26
performed as there are no distinguished phases of pure application computation, separated by discrete periods of
system evolution.
In DRASTIC it is possible to detect at run-time that an inter-zone invocation has failed. However, in GRUMPS,
it is not possible to programmatically detect that a change in the system has caused an event to be processed in an
incorrect way. The two evolution programming models differ in this respect because the DRASTIC approach to
evolution is applied to Java types, whereas in GRUMPS the evolution of the system will effect a single canonical
type, the event. Therefore, in GRUMPS, there is not enough information to be able to detect that an event has been
processed in an erroneous way. The error will only be noticed downstream when an event object is received that
contains state that was unexpected. In GRUMPS, it is assumed that this can be programmed at the application level as
the detection of the error is a semantic issue for the application implementer.
5.2.3 Discussion
The level of encapsulation provided in GRUMPS exists at the GUContainer and GU levels. The design approach for
GRUMPS was to provide a minimal mechanism for managing change within the run-time system, in order to promote
the rapid change of the system. This helps to ensure the system is responsive, simple to use and thus easy to explain
to anyone wanting to use it. Richer kinds of evolution support could be built on top of the GRUMPS system, such as
persistent message queues for the inter-GU data.
The design of the DRASTIC run-time architecture takes the opposite view. How a software engineer evolves a
DRASTIC-based system is heavily prescribed by the DRASTIC evolution model and the approach it takes to evolving
a zone. This is an appropriate approach due to the complexity of changing such a system. The DRASTIC evolution
model, architecture and run-time implementation need to be able to ensure a coherent approach to run-time evolution.
If the software engineer was free to perform certain operations in their own way (for example, some may choose to not
suspend invocations at a zone boundary), other parts of the system may fail as they rely, not only on these operations
being performed, but on them being performed in a certain order. Giving the software engineer too much freedom in a
complex evolutionary system such as DRASTIC could be counter-productive. Thus, the more heavy-weight approach
helps to ensure that the process of applying an evolution to a running system is a well understood operation that is
appreciated by the group performing the change.
The advantage of the DRASTIC approach is that there are standard solutions to particular common problems. In
GRUMPS, it is possible for the same EPO object to be simultaneously replaced by two concurrently executing control
27
events. In the current implementation, the control event to be executed last will succeed. The GRUMPS run-time
system provides themechanismto support this. However, it does not provide any coordination technology to ensure
a particularpolicy is maintained within the system. The DRASTIC system ensures policy is enforced at the zone
boundary, within the context of the contract and change absorbers. The GRUMPS approach to this is to embody policy
in tools and to imbue the run-time architecture with the minimum amount of policy (on top of a flexible mechanism)
to ensure any changes can be done correctly. This promotes an increase in flexibility as the system being changed, the
system performing the change, and the policy that controls the correctness of that change are all separate, allowing for
a change within one to minimally affect any of the other two.
The advantages stated in section 5.1.3 for having the model and design-time concepts present at system run-time
and during an evolution also apply in the GRUMPS system. The main evolution management concepts are the model of
containment and object replacement through the use of control events. The concepts are used at design-time and these
same ideas are reused when the system is evolved. The concepts are also familiar to object-oriented programmers,
reducing the complexity of what they have to learn (adhering to principle 5).
The evolution of both the DRASTIC and GRUMPS systems is a complex endeavour. The DRASTIC and GRUMPS
evolution models and implementations take away some of the complexity of designing an evolvable system and ac-
tually performing the change at run-time. Both approaches focus the software engineer’s attention on the need to be
able to change the system at run-time. In this way, the software engineer is encouraged to consider run-time evolution
to be an integral part of an application’s design (principle 3).
5.3 Comparison of Approach
In a DRASTIC system the kinds of change that could be carried out are the replacement of code for objects that have
had references to them passed outside the zone they exist in. An appropriate change absorber would be placed on
the necessary inter-zone reference chains to handle this update. In terms of the architecture of a DRASTIC-based
system, such an object would provide access to the functionality within a zone. In terms of theAccounts zone in
figure 3, objectm would receive information that the code in thePurchasing zone had bought a good. To ensure the
autonomy of the two zones, the contract between them would allow the change to the type ofm to be made without
having to alter the code that wishes to call into theAccounts zone. In this sense, the zone contract and the change
absorbers allow the design idiom ofdesign by contractto be practised as the exported change absorber ensures that
the pre-existing design contract has been adhered to.
28
In comparison, a change in GRUMPS does not affect thetypeof an object after evolution. GUs communicate in
terms of generic events and the interface for GU reception does not change after an evolution. In GRUMPS it is how
the events are processed that can be changed at run-time (which may make use of new code) and this change is affected
without the need for the heavy-weight suspension of the zone and without the desire to promote domain autonomy. The
GRUMPS approach makes no distinction between the phases of application computation and application evolution, as
DRASTIC does during the suspension of zone activity, and therefore there is no need for a concept like the DRASTIC
zone. The DRASTIC approach is most effective for systems where evolution is a complex endeavour which, to get
right, requires many aspects of the system to be changed during a single evolution phase. Such a complex evolution
requires an approach where one part of the system may be isolated from other parts while the complex evolution
process is applied to it. Once it has been performed, the other parts of the system are then allowed to use it. This kind
of evolution would be the replacement of code within a bank or a hospital system, where the correct replacement of a
part of the system must be performed before the rest of the system may use it.
By contrast, GRUMPS is designed to support the application of many, much smaller kinds of evolution to how data
is processed as it moves across a distributed system. GRUMPS is suited to the rapid change to an already executing
system that needs to process data in a way that may not have been identified when the application was started. In
addition to this constraint, another is that one part of a GRUMPS system may not be able to wait while another part
of it is updated, thus the suspension approach of DRASTIC is not appropriate. These two constraints requires an
architecture that can be updated simply and quickly and in a way that may be performed while the application is
executing, without the need to suspend or take a part of it down.
6 Architecture Implementation
This section describes the key aspects of the implementation of the two architectures that have made run-time evo-
lution easier to support. Section 6.1 starts with a discussion of generally applicable implementation techniques and
then follows in sections 6.2 and 6.3 with details specific to DRASTIC and GRUMPS respectively. For in-depth
implementation details see [8, 9, 11].
6.1 Generally Applicable Implementation Techniques
The generally applicable implementation techniques can be divided into five points: the ability to apply and remove the
evolution system to and from the running system (section 6.1.1); the need for symmetric operations (section 6.1.2);
29
how to support the requirement of run-time decisions (section 6.1.3); and why tool support (section 6.1.4) and the
choice of implementation language (section 6.1.5) are so important.
6.1.1 Applying and Removing the Evolution System
The approach to evolution described in this paper calls for a separation between the application system and the evolu-
tion system (principle 6). Implementing these as two separate, but interoperable, systems has a number of advantages.
Separating them allows the development of the two systems to proceed more independently than they could if they
were combined. Only those parts of the application system that need to interoperate with the evolution system will be
inter-dependent with any changes made in the evolution system. The software for the rest of the application system is
independent of the rest of the evolution system and so changes made at the application side will not affect the imple-
mentation of the evolutionary aspects. As the design of the whole system is divided into two parts that separate what
is being evolved from how it is being evolved, reflecting this in the implementation helps to reinforce this division.
If the evolution system is only ever attached when an evolution must be performed, the run-time overhead that
the evolution system presents is kept to a minimum. The application system only needs to be able to support the
attachment of the evolution system. This part of the system implementation can be optimised as it is that part of
the run-time infrastructure that must be present in the application system to support its own evolution. Any parts of
the evolution system implementation that only need to be executed during an evolution can be kept outside of the
application implementation, thus ensuring the application is not slowed down by executing code it does not need to
for its core task, i.e., providing the functionality of the application to its users.
The application system discussed above can be further divided into two parts; the code dedicated to performing
the functionality of the application; and the code to integrate the rest of the application with the run-time evolution
architecture. Ideally, all code that interfaces the application with the underlying evolution system should be tightly
encapsulated and present in as few places as possible. This is made possible if there is a clear distinction between the
various parts of the application system.
In DRASTIC, code that embodies run-time evolution support (such as the change absorbers) becomes part of the
post-evolution system. Change absorbers are installed on inter-zone references which remain after an evolution has
been performed. Application-level inter-zone references are redirected via the DRASTIC platform in each process.
Therefore, in DRASTIC, the places at which the application code and run-time evolution support architecture interface
are tightly encapsulated, but they are present in the architecture at several points. This is because in the DRASTIC
30
architecture there are several distinct points along the inter-zone reference chain where evolution support may be
added. The advantages of keeping the number of points to a minimum was something that was learned later on in the
GRUMPS project.
The GRUMPS-based application support system consists of the Teaq and QOP subsystems and the GUContainers
and GU objects. The functionality of the application is embodied in the event processing objects and in any code that
they call. Thus, in GRUMPS, the distinction between the pure application code and that part of the application that
is interfaced with the evolution support system is made clear to the programmer at the level of the event processing
object. It is in this object that the boundaries of the two systems inter-mingle. The interface between the two systems
is restricted to a single concept in the system and as such it is clearer for the programmer to understand the inter-
dependencies that exist between these parts of the application. Knowing this allows them to make better decisions
about where parts of their application code should reside. If they are placed into the event processing object the
programmer needs to be aware that this code is subject to GRUMPS-supported replacement. If the code is not held
here, it therefore does not interface with the evolutionary support provided by the GRUMPs system and so cannot be
replaced.
In GRUMPS a programmer may further encapsulate the evolution system by placing it into a separate process.
This process can attach itself to the spanning-tree and can then monitor the ongoing activity in the system by inspecting
the GUContainer and GU objects. After the evolution has been performed, it can be detached and from this point on
the evolution of the system will present no run-time overhead.
The two different approaches described above highlight another difference between the DRASTIC and GRUMPS
approaches to run-time evolution. In DRASTIC objects are added to the executing system that remain after the
evolution has been performed and these objects bring additional semantic meaning to the application. By contrast,
evolution in GRUMPS is performed by sending an object into the system which will have some semantic-altering
affect on it (principle 4), but after the effect has been applied, this object is removed from the system. The advantage
for GRUMPS is that the object used to perform the update does not present a run-time overhead in the way that the
change absorbers in DRASTIC do. However, the advantage of the change absorbers is that the software engineer has
an explicit list of objects that inter-zone interactions must travel through. A by-product of this explicit list is that the
software engineer has a history of the evolutions that were performed and the order in which they were applied.
31
6.1.2 Symmetric Operations
When implementing an evolution support system it is important to provide a mechanism that can support symmetric
operations. A symmetric operation is one that may be applied as well as removed. If only one operation is provided,
this is referred to as asymmetric. It is important to provide symmetric operations as an evolution operation that has
been applied to a system may need to be removed at a later stage.
When evolving a running system, a side effect may occur that the removal operation may not be able to undo.
This is an issue for the software engineer to deal with. It is typically not something that the implementation of the
underlying evolution support system can deal with. The point here is that if symmetric operations are not provided,
certain operations on the whole system cannot be (easily) performed. The GRUMPS system provides symmetric
operations by providing support for objects to be added and also removed from a GUContainer. DRASTIC supports
this by allowing change absorbers to be added and removed from inter-zone reference chains.
When implementing a software system, opportunities arise for its optimisation. For example, Java defines an
optimised set of bytecodes. Typically, when implementing an optimisation for a system, the need to be able to
subsequently perform an evolution within the context of the optimised system is not always obvious. This can be
because the person who is implementing the optimised operation is not focussed on the need for evolution and so
there may be no way to reverse the effects of the optimisation. This can lead to an asymmetric design which can make
evolution very difficult to perform. If it is not possible to undo an optimisation, the implementer of the evolution
system has to provide two versions, one for the unoptimised system and one for the optimised version. The two
versions may also be quite different at the implementation level due to the nature of the optimisation, making the
implementation of the evolution support system more complicated. For these reasons, it is preferable to be able to
undo the effects of the optimisation so that the system may be evolved in its unoptimised state. This then only requires
one version of the evolution system.
6.1.3 Supporting Run-time Decision Making
To evolve an executing system those responsible for updating it must be able to make as many system-changing
decisions as are appropriate at run-time. If important aspects of the executing system are fixed at some other stage and
these cannot be changed at run-time, the only way to change them is to stop that part of the system. Their redesign and
reimplementation may occur in parallel with the currently executing system, but to deploy the new code, some part of
the system will have to be shutdown. However, it may not be feasible to perform any of these operations. For example,
32
the executing system may not be able to survive having parts of it removed. If this is required, such a fundamental
requirement should be considered at the design stage of the evolution support system, as it was with the DRASTIC
project. It may be convenient to be able to make certain decisions before the system is deployed and executed. If those
decisions cannot be changed once the system is running, the limited situation above arises. If these decisions may be
changed at run-time, those using the system are given more choice in how they evolve and change the system during
its lifetime.
However, in the DRASTIC system it was convenient to separate and make distinct the phases of application
computation from the discrete periods of application evolution. This design was chosen as a simplifying assumption
to help manage the level of complexity when evolving a single zone. It would be possible to design a system where
more decisions could be made at run-time, however, the engineering of that system would be more complicated. For
example, rather than suspending all calls into a zone, a zone could be evolved on a per-process basis. This would result
in some processes executing post-evolution code while others would be simultaneously running pre-evolution code.
The software engineer would then have to handle the potentially complex interaction between executing processes
on both sides of the evolution wave. The disadvantages of handling this interaction would outweigh the advantages
that could be derived from performing evolution concurrently with application execution in that zone. This is because
the application programmer would have to manage the interaction of pre- and post-evolution processes as this is
a semantic operation that only they can understand and, therefore, manage (which is an example of the need for
principle 4). The DRASTIC evolution run-time system provides the mechanism to support this interaction, but it
cannot help in establishing a meaningful and safe inter-process interaction. Therefore, although it is preferable to
provide support to make as many decisions at run-time as is appropriate, there may be sound engineering reasons why
support for certain kinds of run-time decision is not provided.
In contrast, the design of the GRUMPS run-time evolution system encourages the software engineer to make as
many evolution decisions as are appropriate at run-time. This was done to maximise the likelihood that an entire
GRUMPS-based application system could stay running while it was also being updated. A control event can be sent
to a deployed object to change the way that it deals with its incoming events. This does not require the process
that the object exists in to be brought down. However, providing this kind of approach to run-time evolution has
implications for other parts of the system. The evolution of a GRUMPS system is less controlled and less managed
that a DRASTIC system. Therefore, those evolving a GRUMPS system have to be careful that a change does not have
knock-on effects throughout the whole system, which is something that the DRASTIC system explicitly managed.
33
The current implementation of GRUMPS provides the basic set of mechanisms to allow an application system to be
updated at run-time. If additional kinds of evolution support are necessary, these can be provided above the system,
using a standardised set of control events to express this.
Providing support to perform as many evolutionary changes as are appropriate while the system is executing does
lead to a flexible run-time environment. However, this level of power is not always appropriate in every situation.
For this reason, those implementing run-time evolution systems should take care to identify those cases where pro-
viding such support could be a disadvantage due to the implications it has for other parts of the evolution system’s
implementation.
6.1.4 Tool Support
Typically, before a system is changed, the software engineer will want to perform a preliminary inspection of the
running system to check its current state. This will allow them to judge whether the approach they want to take to
changing the system is appropriate, e.g., to ensure no invariants are broken (see section 3.3). To be able to effectively
do this within an executing, distributed system requires tool support. The design and implementation of such tools
was not the central focus of either of the projects reported here. However, they would have proved useful towards
the latter parts of the systems’ design and implementation to ensure a test evolution proceeded in a well-known and
predictable manner.
Such tools should support the person evolving the system in a number of ways. They should allow them to
perform coordinated changes to multiple parts of the distributed system. For example, within GRUMPS, being able
to evolve two GU objects that reside within different processes is not something that the evolution support system
directly supports. It would be appropriate to place such support into a tool that made use of the existing evolution
mechanisms. Tools to visualise the current state of the distributed system would also be valuable. Before performing
an evolution, those changing the system need to understand its current state. Once the current system is understood,
support for performing “what-if” analysis on the system would be extremely useful. A software engineer could pose
questions to the tool that embodied hypotheses the engineer had about particular ways of changing the system. The
tool would then respond with results that indicated the kinds of effect a particular approach to evolution would have
on the executing system. This information would be used by the software engineer to modify their approach to an
evolution.
34
6.1.5 Implementation Language
The choice of implementation language will have an impact on the kinds of evolution that may be performed. Both
the DRASTIC and GRUMPS systems are programmed in Java and both evolution support systems are implemented in
the same language as that used to implement the application system. Using the same language for both the evolution
and application system makes evolving the application system easier as language-level entities may be freely passed
between the two systems. Java can include code written in other languages (typically C), however, providing access to
the Java-based evolution framework from another language would create practical engineering problems. For example,
when freezing a DRASTIC zone, any processes that made use of any non-Java code would have to ensure this state did
not interfere with the evolution process. This code, which exists outside the evolution framework, may rely on types
within the framework that are being evolved. To effectively address this requires technology to track and manage the
evolving state within and outside the evolution framework. This is a complex problem in its own right and would
require extensive additional research to address.
The programming language defines the most primitive kinds of evolution that may be performed. For example,
Java makes the replacement of code behind a Java interface easy so this approach was used for the GRUMPS project
where such a replacement was appropriate. However, the DRASTIC project wanted to support evolution operations
that were not compatible with the Java type system (for example, the ability to remove a method from an evolved
type). Therefore, as Java was being used, a different approach (based on change absorbers) was used. If the Java
language definition supported method removal (as other object-oriented languages such as Self [5] and Beta [22] do),
the design and implementation of the change absorber mechanism would have been different as there would have been
more direct support from the underlying programming language.
6.2 DRASTIC Architectural Implementation
In terms of supporting the evolution of a run-time, long-lived, persistent, distributed system, the DRASTIC project
highlights two implementation techniques that have made the provision of such evolution easier. The use of indirection
is discussed in section 6.2.1 and section 6.2.2 describes the advantages to be gained by a separation of functional roles
within the run-time system.
35
6.2.1 Using Indirection
The DRASTIC run-time architecture shown in figure 3 shows the rerouting of an inter-zone reference via the two
ZBP processes responsible for enforcing the contract between the two zones. This indirection is advantageous for
two reasons. Firstly, as the inter-zone reference chain is redirected via the DRASTIC platform, the platform may
manipulate the inter-zone reference chain to provide the evolutionary facilities described in section 4.1. Secondly, by
adding objects to the reference chain allows those responsible for evolution to change the semantics of an inter-zone
method invocation, without the invoker or invokee being aware of this change.
The DRASTIC evolution platform was written in such a way that the objects that are interposed onto the inter-zone
reference chain can be added and removed such that both the invoker and the invokee are unaware of their presence.
A reference to an object may be handed out to a process in another zone, but the software engineer who is responsible
for the zone where the object resides can still retain some control over how invocations to that object are handled.
This is useful in a system where the evolution of objects can occur at any time within a zone. As objects may be
placed onto the reference chain to affect its operation after the reference has been handed out, the process that holds
the invocation end of the chain does not have to be updated when an evolution within the zone takes place. This is
a classical application of indirection. Being able to affect the semantics of an inter-zone invocation are particularly
useful in a distributed system where it can be difficult to control how a reference is used once it has been handed out
to another process. The receiving process may reside within another zone which is at some arbitrary point within the
distributed system that the handing-out process has no knowledge of and no control over.
An overuse of indirection can add an unwanted amount of run-time overhead to an inter-zone method invocation.
This can be reduced by combining some logically separate operations into single objects. For example, the software
engineer could combine a number of distinct change absorbers into a single change absorber, reducing the overhead
of calling through the chain and, hopefully, optimising the translations that are required. Separating logical operations
is a useful approach to building evolvable systems, however, indirecting application-level references should not be
overused.
6.2.2 Separation of Functional Roles
The DRASTIC run-time architecture separates the different functional roles within a distributed system, abstracting
them into separate processes. For example, the role of evolving a zone is given to theEvolverMgr and the job of
handling the persistent meta-data for the whole system is delegated to thePASManager (section 5.1). This separation
36
leads to a clean implementation of the evolution mechanism as different functional facilities are encapsulated into
separate processes. Separating the different functional roles in this way can lead to some additional fault tolerance
as system functionality is not contained within a single process. However, it can be tedious (and thus possibly error-
prone) to boot such a system as a number of processes have to be managed and started in a particular order if there are
any inter-dependencies between them.
6.3 GRUMPS Architectural Implementation
The main aspect of implementing the GRUMPS run-time evolution architecture was choosing the appropriate tradeoff
between supporting run-time decision making and allowing those who have programmed the system the opportunity
to enforce their own decisions.
This point is best illustrated by considering the implementation of the query-oriented programming mechanism.
The QOP mechanism allows a GRUMPS programmer to find objects within the distributed system that match a query.
The query is propagated to each process and the QOP run-time mechanism translates the query string into a single
Java class, storing it as source code. To perform the query, the source code is first compiled by the GRUMPS run-time
platform. The class implements an interface that provides a well-known method which is called by the QOP system to
execute the query. This has the advantage that the query is strongly typed and the loaded class is subject to the usual
Java checks, such as bytecode verification. Making the compiler available at run-time allows the programmer to write
queries based on information that is only available at run-time, allowing the programmer to defer some decisions until
application execution.
Objects that are registered for remote query may be instances of an interface type called Replaceable. The Re-
placeable interface defines a single method (called Replace) that takes one object as an argument and returns an object
as the result. If a query result contains an instance of this type, the QOP system, via its own object-serialization
mechanism, passes a reference to the matched instance to this method. The implementer of the interface may then
return an object of a different type which will become part of the query result that is passed back to the query initiator.
This mechanism is used by the GRUMPS run-time system to translate matches to GUContainer objects into objects
that refer to their control channels (section 4.2).
In the current implementation of the QOP mechanism the implementer of the interface controls how its instances
are managed should any one of them be matched in a query. There is no facility in the current implementation to be
able to override how such an instance is processed. Once these objects are deployed it is possible for the user of the
37
system to want to be able to process them in a different way that has only come to light after the code was written
and after the objects were created and sent into the system. For certain GRUMPS system-management operations it
would be convenient to be able to specify in the query how a matched object was processed. See section 7.3 for more
information.
7 Conclusions
There are three main conclusions from the DRASTIC and GRUMPS work in addition to the four core ideas and the
formulation of the ten principles of evolution: an evolution model and its implementation is crucial to successful
distributed system run-time evolution; application systems built using them must be programmed for evolution; and
allowing decisions to be performed at run-time is a powerful and flexible approach to building both the run-time
evolution support systems and the applications that make use of them.
7.1 The Evolution Model and its Architecture
The DRASTIC and GRUMPS work has shown that in order to support the run-time evolution of a large, long-lived,
distributed system an evolution model and its implementation should be made available to those responsible for design-
ing and evolving the distributed application. This is because the support for the evolution of a distributed application
should be handled by a distinct part of the system. Without an evolution model and its architecture, it would be harder
to cleanly design the application and separate those parts of the system that were and were not subject to run-time evo-
lution. This would confuse the design and would lead to different (possibly incompatible) implementation approaches
being taken to run-time change.
Both the DRASTIC and GRUMPS systems make available technology to support the design and implementation
of a distributed application that could be evolved during run-time. Two language-level libraries, documented APIs
and two different evolution-supporting component architectures were provided. In addition, tools and methodologies
were provided to support the software engineer in preparing for, performing and subsequently monitoring the post-
evolution system. Both of these technologies were provided in Java with the software engineer using this language to
solve their domain-specific problem, within the context of, and by receiving support from, the evolution technology.
An evolution model and its implementation bridges the semantic gap between the evolution facilities provided
at the programming language and component framework levels and any higher-level facilities that other approaches
have made available, e.g., using the architectural-evolution systems. When viewed in this way, this kind of run-time
38
evolution support can be seen as evolution middleware. The evolution platform provides a standard set of evolution
facilities for higher-level code by abstracting over the particular support provided at the lower level. A middleware-
oriented approach is useful even if there is no higher-level evolution functionality as the resulting standardisation helps
the programmer to design and build an application system that may be more easily understood and changed by others
in the future.
In the same way that there is no one universal programming language and no single globally-applicable middleware
solution, there is also no single evolution model. The DRASTIC and GRUMPS work shows that the evolution model
and its implementation can be tailored to integrate with the computing environment within which evolution needs to
be performed. The DRASTIC and GRUMPS evolution models both suit their environments and their implementations
can make appropriate tradeoffs based on this environment. For example, within DRASTIC it was appropriate to design
and implement the system to interpose evolution technology on inter-zone, inter-object references. However, such a
solution was not appropriate within the GRUMPS system, because of the differences in the kinds of application that
the two systems were aimed at. In GRUMPS it was more appropriate to apply evolution by replacing the functionality
of an object, rather than abstract over a change made in one part of the system.
7.1.1 Creating an Evolution Model
The evolution model and its implementation must support two different aspects of run-time evolution. The evolution
model and any tools should support thedesignof an evolvable system, and the programmable component must support
the software engineer inperformingthe change to the running system. As the software engineer wants to produce a
system that may be changed at run-time, the evolution model should support them in designing a system to be changed
in such a way. The advantage of the evolution model in this area is that it focuses the attention of the software engineer
on the need for run-time change at the very start of the development of their code, making change a central part of the
architecture of their system.
When implementing an evolution model it is important that both a rich set of evolution facilities are made available
to the application programmer, and that the facilities provided by the implementation language are exploited. Both
DRASTIC and GRUMPS make available a richer set of evolution primitives than those directly provided by the Java
language: DRASTIC supports subtractive change to an evolved Java type, and GRUMPS supports the programmer
in providing default, but extensible support, for simultaneously updating a number of objects within a single GU-
Container. GRUMPS-based run-time evolution support is made much easier by exploiting the underlying language’s
39
ability to easily replace one class with another, via object reference update. Exploiting the language makes it easier to
explain to the software engineer how evolution may be performed in the system as it is just a particular application of
the programming model they are already familiar with.
Once the design of the evolution model has been completed, the implementation of the framework is relatively
straightforward. The implementation of the two run-time evolution frameworks described here represent approxi-
mately five man-years of work. Both implementations are small by modern standards, the core of the DRASTIC
implementation is just over 9,500 lines of code (LoC) and GRUMPS is just under 12,500 LoC8. These figures are not
to suggest that the construction of a run-time evolution framework is trivial or can be quickly implemented, but that it
is possible to provide such a framework. Implementing a framework for the run-time evolution of a distributed system
is similar in scope to the construction of other kinds of framework. For example, Sun Microsystem’s J2EE enterprise
programming framework is a large and complex piece of software. However, it benefits those that use it by providing
standard programming solutions, reducing the amount of work they have to do and increasing the likelihood that their
software can be successfully integrated with other J2EE-based software solutions.
7.2 Programming for Evolution
The second outcome of this work is that systems must be programmed in a way that is sympathetic to the possibility
that they may be changed at run-time. This is made easier if an evolution model is provided.
The maintenance of software systems typically consumes between 40 and 80 percent of project costs [13]. If a
system cannot be taken down in its entirety to perform a change, it is inevitable that some form of run-time evolution
will be required. In this context, the evolution of a running system is necessary and inevitable. By acknowledging that
evolution will be necessary, provision for it can be made at the start of the software lifecycle, addressing the needs of
evolution as an integral part of the design and implementation of the application system. The work reported in this
paper has shown that it is not always desirable to abstract over all forms of change and trying to hide all change from
other parts of the system can be counter-productive (see section 3.3).
Changing the implementation of an executing program changes its semantics. As a result, run-time evolution
cannot be fully automated, however, this work shows it can be effectively supported. The software engineer can
increase the effectiveness of this support by programming for evolution and by acknowledging that not all evolutionary
change can be abstracted over. Such an approach to programming requires the software engineer to ask additional
8These figures do not take into account any tools or other management software, e.g., DRASTIC’sPASManager.
40
questions during the design and implementation of their application code. For example, in DRASTIC it is possible for
an inter-zone method invocation to fail after an evolution has been performed. The programmer of the invoking code
has to be aware that an invocation can fail for evolutionary reasons, as well as other well-established reasons, such as
due to distribution.
To support run-time change to an executing application it is crucial that those performing the change can make as
many decisions as are appropriate at run-time. The evolution model and its implementation should be flexible enough
that any decisions that have been made at a time before application deployment and execution can be changed once the
application has been started. However, there is a tradeoff to be performed here as supporting the run-time change to
every component within an executing system can leave its implementers with no control over aspects of it that should
not be changed.
Working in a distributed environment can aid as well as hinder the ability of a software engineer to change an
application system at run-time. Distribution can be a hindrance because of its classical features, e.g., partial failure. It
can aid in evolution, however, as processes and machines provide a natural boundary at which to perform evolutionary
change. This was exploited in the DRASTIC system in providing the majority of the evolution support in theZBP
processes.
At the source code level, the use of the evolution model and its API permeates many different parts of the imple-
mentation of a software system. Therefore, it is currently difficult to see how the code that implements the process of
performing an evolution and the application code that reacts to a change (e.g., the DRASTIC code to handle the failure
of a post-evolution method invocation) can be abstracted into aspects [20]. Evolution support is a fundamental part
of the implementation of an application system, with the evolution API providing a standardised approach to using
that support. Such support is as fundamental to a program as is the level of support for distribution and the handling
of concurrency. It is an open question as to degree to which aspects could be usefully exploited in the design and
implementation of an evolution model (see section 8.1.3).
7.3 Run-time Decision Making
Section 6.3 described a particular instance of the more general problem of choosing an appropriate tradeoff between
supporting flexibility via run-time decision making and the need for programmers to be able to enforce the decisions
they have made in their code.
In the current implementation of the QOP query mechanism, a software engineer can fix how a matched instance
41
is handled by QOP. The programmer enforces this by providing the code to process the matched instance as part of
that instance’s Java class. However, another choice is possible, which is to allow the processing decision to be made
by the query that found the instance.
If the decisions could be performed by the query, any decision that the author of the class had made would be
ignored. The challenge is finding the appropriate balance between these two approaches. Allowing the implementer
of the interface to make the decision gives them control over how their instances are treated by the rest of the system.
However, given the current GRUMPS run-time infrastructure, once the software engineer has created the Java class
to do this, the choices that it embodies are fixed. The alternative of placing the decision-making code in the query is
more flexible as the decision can be performed at run-time. This can be useful when updating a system as code fixed
in the Java class may not be suitable for use in an evolving context. The implementer of the code cannot possibly
anticipate its every use. In addition, the implementer cannot write code to handle an evolutionary case that has yet to
be defined. Therefore, it would be useful to be able to specify this new code at run-time, as part of the query. However,
without some form of object access model, the author of the Java class has no control over how their pre-evolution
instances will be treated by the rest of the system.
8 Open Issues and Future Work
This section discusses the open issues for the support of run-time evolution in general (section 8.1) and then goes on
to discuss particular improvements to the GRUMPS work in section 8.2.
8.1 Open Issues
There are four main open issues facing those working in the field of run-time evolution: where in the computing
hierarchy should particular types of run-time evolution be supported (section 8.1.1); what is the exact set of run-time
evolution primitives (section 8.1.2); whether or not aspects may be a benefit to using a run-time evolution framework
(section 8.1.3); and how object security is reconciled with a desire for flexible run-time change (section 8.1.4).
8.1.1 Placement of Run-time Evolution Support
The DRASTIC and GRUMPS work has benefited from providing evolution support above the level of component
frameworks as programmers can make use of the evolutionary facilities as if they were a part of a more comprehen-
sive component framework. However, within the field of run-time evolution, it is not clear where in the software
42
architecture hierarchy certain facilities could be most effectively placed. This is possibly because a comprehensive
survey of these issues has not been conducted and systems that are implemented (including DRASTIC and GRUMPS)
tend to remain within one layer of the hierarchy. As distributed application systems become more complex, the need
for run-time evolution support at particular points within the hierarchy will become more pressing because developers
of systems at higher levels will not want to reimplement facilities that could be more reliably and more powerfully
implemented below.
8.1.2 Defining a Useful Set of Run-time Evolution Primitives
The work on DRASTIC and GRUMPS address some of the issues of supporting run-time evolution within two specific
computing environments. Defining a generally useful set of run-time evolution primitives within a particular evolution
model is an open issue. As stated in section 7.1, one universal evolution model probably cannot be defined. In
addition, the primitives that are provided by the DRASTIC and GRUMPS systems are targeted towards those domains;
no attempt has been made to generalise the support to other application domains. It would be a useful exercise
to consider run-time evolution support within the context of a commercially available component framework. An
empirical evaluation of the approaches described in this paper could now be usefully performed, given that the initial
research into the evolution models and their architectures has been completed.
8.1.3 The Use of Aspects
It is an open issue if the use of aspects would benefit the design of the evolutionary parts of a large, long-lived,
distributed application. This issue is open because the work on evolution models as defined in this paper and the work
on AOP have been kept separate. The work of Kienzle and Guerraoui [21] suggests that some uses of aspects may be
difficult when that aspect is heavily associated with the phenomenon that the underlying object is modelling, such as
concurrency or distribution. The objects that are to be evolved are intertwined with other objects (that are possibly not
being evolved) as well as other characteristics of the application, such as the execution of threads of control. Due to
this intertwining, it is not clear at this stage if aspects could be used to fruitfully abstract over the evolutionary parts
of an application.
8.1.4 Run-time Evolution and Object Security
Section 7.3 describes a particular instance of a wider issue which is the simultaneous desire to provide a flexible
evolution mechanism with a wish to also allow those programming the system to specify what may not be changed,
43
e.g., for reasons of security. Those making an object available within a system may want to specify that access to it is
restricted in some way and this desire may impede someone else’s ability to change the system, e.g., the inability to
access part of an object’s state when moving data between different versions of an object.
To address this problem some form of object access protocol or underlying security model would be required
so that the flexibility possible with the ability to perform a run-time decision can be controlled by the implementer
should they wish. This is an area that the designers and implementers of evolution support systems should consider
as a fundamental part of the evolution model as it will have a large impact on the programming model (c.f. the Java
security model and its impact on the Java language and its core classes).
8.2 Future Work
The DRASTIC and GRUMPS systems represent an initial attempt at defining, designing and implementing two evo-
lution models and their architectures for distributed systems.
Future work for the GRUMPS project9 can be divided into two main areas. One fruitful area of future work would
be to assess their impact on the production and subsequent evolution of a distributed system that others relied on every
day. The initial evolution design and implementation work has been performed and now would be an appropriate time
to take the proof of concept systems forward, reconsidering them within the context of a more constraining real-world,
commercial setting.
The DRASTIC and GRUMPS work has not addressed the area of checking for update safety because investigating
the basic evolution model support was the central focus of both of these projects. However, a security model would
make the application of an evolution model safer, especially within the context of a distributed system. Such work
has been considered by Hicks and his application of verifiably safe native code [16] and such an approach within the
context of DRASTIC or GRUMPS may be of value.
9 Summary
This paper has described the outcome of designing and implementing two different run-time evolution models. The
work on the DRASTIC and GRUMPS systems has demonstrated that it is possible to support the run-time evolution
of distributed systems by making available an evolution model and its architecture, and expressing it to a software
9The DRASTIC project finished in 1998.
44
engineer via a combination of library code, tools and methodology support. This work has also shown it is possible to
use an application system even though part of it may be being changed at run-time.
The lessons learned have been distilled into four generally applicable core ideas and the ten principles that may
be used when considering the design and implementation of an evolution model. It has been shown that an evolution
model can be easily designed and implemented, although it must be tailored to the environment in which evolution is
to be performed.
It has been argued that there is a lack of run-time evolution support within the software architecture hierarchy,
in between component models and any higher-level evolution support or application-level systems. The DRASTIC
and GRUMPS work is a first step towards providing the necessary run-time evolution model and implementation to
address the open issues within this gap. More work needs to be conducted in this area and the provision of an evolution
model that could be supported within the context of J2EE or .NET would be a fruitful area of research.
10 Acknowledgements
The author gratefully acknowledges the support provided by the GRUMPS team and the UK’s EPSRC research council
for providing the funding for both the DRASTIC (grant GR/J99285) and GRUMPS (grant GR/N38114) projects. The
author also thanks Ray Welland, Peter Dickman, Michael Dales and Gareth P. McSorley for reading earlier drafts of
this paper.
References
[1] Mehmet Aksit, Ken Wakita, Jan Bosch, Lodewijk Bergmans, and Akinori Yonezawa. Abstracting Object In-
teractions Using Composition Filters. In Rachid Guerraoui, Oscar Nierstrasz, and Michel Riveill, editors,Pro-
ceedings of the ECOOP’93 Workshop on Object-Based Distributed Programming, volume 791, pages 152–184.
Springer-Verlag, 1994.
[2] J. Armstrong, M. Williams, and R. Virding.Concurrent Programming in Erlang. Prentice-Hall, Englewood
Cliffs, NJ, 1993.
[3] Malcolm Atkinson, Margaret Brown, Julie Cargill, Murray Crease, Steve Draper, Huw Evans, Philip Gray,
Christopher Mitchell, Martin Ritchie, and Richard Thomas. GRUMPS Summer Anthology, 2001. Technical
Report TR-2001-96, Department of Computing Science, Glasgow University, September 2001.
45
[4] Toby Bloom and Mark Day. Reconfiguration and Module Replacement in Argus: Theory and Practice.Software
Engineering Journal, 8(2):102–108, March 1993.
[5] Craig Chambers, David Ungar, and Elgin Lee. An Efficient Implementation of Self, a Dynamically-Typed
Object-Oriented Language Based on Prototypes.ACM SIGPLAN Notices, 24(10):49–70, October 1989. OOP-
SLA ’89 Conference Proceedings, Norman Meyerowitz (ed), New Orleans, Louisiana.
[6] Huw Evans. Query-Oriented Programming. Unpublished paper,
Available fromhttp://www.dcs.gla.ac.uk/˜ huw/, 2003.
[7] Huw Evans. Run-Time Evolution in Distributed Systems. PhD thesis, University of Glasgow, November 2003
(in preparation).
[8] Huw Evans, Malcolm Atkinson, Margaret Brown, Julie Cargill, Murray Crease, Steve Draper, Phil Gray, and
Richard Thomas. The Pervasiveness of Evolution in GRUMPS Software.Software: Practice and Experience,
33(2), February 2003.
[9] Huw Evans and Peter Dickman. DRASTIC: A Run-Time Architecture for Evolving, Distributed, Persistent
Systems. In Mehmet Aksit and Satoshi Matsuoka, editors,Proceedings of the European Conference on Object-
Oriented Programming (ECOOP ’97), volume 1241 ofLNCS, pages 243–275, Jyv¨askyla, Finland, June 1997.
Springer.
[10] Huw Evans and Peter Dickman. Supporting Software Evolution in a Distributed Persistent System. In An-
dre Schiper and Marc Shapiro, editors,Proceedings of the 2nd European Research Seminar on Advances in
Distributed Systems (ERSADS ’97), volume 2, pages 147–152, Zinal, Switzerland, March 1997. EPFL.
[11] Huw Evans and Peter Dickman. Zones, Contracts and Absorbing Change: An Approach to Software Evolu-
tion. InProceedings of the Conference on Object-Oriented Programming, Systems, Languages and Applications
(OOPSLA ’99), volume 34 ofSIGPLAN Notices, pages 415–434, Denver, Colorado, USA, October 1999. ACM.
[12] Huw Evans and Peter Dickman. Peer-to-peer Programming with Teaq. In Enrico Gregori, Ludmila Cherkasova,
Gianpaolo Cugola, Fabio Panzieri, and Gian P. Picco, editors,Workshop on Web Engineering and Peer-to-Peer
Computing, pages 289–294. Networking 2002, Springer Verlag, LNCS 2376, May 2002.
[13] Robert L Glass.Facts and Fallacies of Software Engineering. Addison-Wesley, 2003.
[14] Adele Goldberg and David Robson.Smalltalk-80: The Language and its Implementation. Addison-Wesley,
1983.
46
[15] The Grumps Project Website, 2001. http://grumps.dcs.gla.ac.uk/.
[16] Michael Hicks.Dynamic Software Updating. PhD thesis, University of Pennsylvania, 2001.
[17] Robert Hirschfeld, Matthias Wagner, and Kris Gybels. Assisting System Evolution: A Smalltalk Retrospective.
In Gunter Kniesel, Joost Noppen, Tom Mens, and Jim Buckley, editors,The First Workshop on Unanticipated
Software Evolution (USE2002) inECOOP2002 Workshop Reader, LNCS 2548, Malaga, Spain, 2002. Springer.
[18] J2EE 1.4 Architecture Specification.http://java.sun.com/j2ee/1.4/docs/.
[19] Gregor Kiczales, Jim des Rivi`eres, and Daniel G. Bobrow.The Art of the Metaobject Protocol. MIT Press, 1991.
[20] Gregor Kiczales, John Lamping, Anurag Mendhekar, Chris Meada, Christina Lopes, Jean-Marc Loingieter, and
John Irwin. Aspect-Oriented Programming. In Mehmet Aksit and Satoshi Matsuoka, editors,Proceedings of the
European Conference on Object-Oriented Programming (ECOOP ’97), volume 1241 ofLNCS, pages 220–242,
Jyvaskyla, Finland, June 1997. Springer.
[21] Jorg Kienzle and Rachid Guerraoui. AOP - Does It Make Sense? The Case of Concurrency and Failures. In
Boris Magnusson, editor,16th European Conference on Object-Oriented Programming (ECOOP 2002), LNCS
(Lecture Notes in Computer Science), Malaga, Spain, 2002. Springer Verlag. Also available as Technical Report
IC No 2002/016.
[22] Ole Lehrmann Madsen, Birger Moller-Pedersen, and Kristen Nygaard.Object-Oriented Programming in the
BETA Programming Language. Addison-Wesley, Reading, 1993.
[23] Jeff Magee, Naranker Dulay, and Jeff Kramer. Regis: A Constructive Development Environment for Distributed
Programs.Distributed Systems Engineering Journal, 1(5):304–312, 1994.
[24] Microsoft Corporation.Microsoft C# Language Specifications. Microsoft Press, 2001.
[25] Microsoft .NET Home Page, 2002.http://www.microsoft.com/net/.
[26] Misha Dmitriev, 2003.http://www.experimentalstuff.com/Technologies/HotSwapTool/.
[27] Erik Odberg. A Framework for Managing Schema Versioning in Object-Oriented Databases. InDatabase and
Expert Systems Applications - DEXA ’92, Valencia, Spain, September 1992.
[28] Peyman Oreizy, Nenad Medvidovic, and Richard N. Taylor. Architecture-Based Runtime Software Evolution.
Technical Report ICS-TR-97-38, University of California, Irvine, Department of Information and Computer
Science, September 1997.
47
[29] Alan Pope.The CORBA Reference Guide. Addison Wesley, 1998. ISBN 0-201-63386-8.
[30] Streve Waterhouse, 2003.http://search.jxta.org/.
[31] Clemens Szyperski.Component Software: Beyond Object-Oriented Programming. ACM Press and Addison-
Wesley, New York, N.Y., 1998.
[32] Richard Thomas, Gregor Kennedy, Steve Draper, Rebecca Mancy, Murray Crease, Huw Evans, and Phil Gray.
Generic Usage Monitoring of Programming Students. In20th Annual Conference of the Australasian Society
for Computers in Learning in Tertiary Education (ASCILITE 2003), Dec 2003.
[33] W3C Web Services Activity. http://www.w3.org/2002/ws/.
[34] Gio Wiederhold, Peter Wegner, and Stefano Ceri. Toward megaprogramming.Communications of the ACM,
35(11):89–99, November 1992.
48
List of Figures
1 An Inter-zone method Invocation via the Zone Contract . . . . . . . . . . . . . . . . . . . . . . . . 17
2 An Example GRUMPSNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3 The System-Wide DRASTIC Run-Time Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 23
4 An Example GRUMPS-based Distributed Investigation . . . . . . . . . . . . . . . . . . . . . . . . 25
5 An Inter-zone method Invocation via the Zone Contract . . . . . . . . . . . . . . . . . . . . . . . . 50
6 An Example GRUMPSNet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
7 The System-Wide DRASTIC Run-Time Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 51
8 An Example GRUMPS-based Distributed Investigation . . . . . . . . . . . . . . . . . . . . . . . . . 51
49
Zone Purchasing Zone Accounts
M
User Process 2
m:M
User Process 1
n:NContract Contract
changeabsorber
N
Figure 5: An Inter-zone method Invocation via the Zone Contract
User 1
cleaneduser data
User 2
Key
Inter−GU reference
Teaq parent reference
ThreadEvent queue
ChannelControl
Channel
GEGE
GE
Control
EPO
GE Grumps EventGUContainer GU
Teaq Root Process
Figure 6: An Example GRUMPSNet
50
Personnel Zone
PurchasingZone
Accounts Zone
tn
xt xm
User Process 2
ZBP
ZBP
ZBP m:MZBP
EvolverMgrRegistry
ZBP
ZBP
n:N
ZSPMDaemon
User Process 1
PASManager
Figure 7: The System-Wide DRASTIC Run-Time Architecture
cleaneduser data
User 1data
User 3
cleaneduser data
User 1
User 3
User 4
User 2
cleaneduser data
cleaneduser data
User 2
User 4
GEs
Teaq parent reference
collection GUuser data
Key
collection GUuser data
data cleaning GU
GEs
Inter−GU reference
GEs
(a) Application−Level View (b) Teaq Process−Level View
GU
Thread Grumps EventEPO
Event queue
GE
Teaq Root Process
Teaq process
Figure 8: An Example GRUMPS-based Distributed Investigation
51