, ITIL

INTEGRATING CFENGINE, ITIL AND

ENTERPRISE PROCESSES

MARK BURGESS AND THOMAS SCHAAF

December 5, 2008

Contents

1 Introduction 51.1 Business alignment . . . . . . . . . . . . . . . . . . . . . . . . . . 51.2 ITIL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.3 Business processes and goals . . . . . . . . . . . . . . . . . . . . 7

1.3.1 Teams, control structures and collaboration . . . . . . . . 91.4 Promises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.4.1 A theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.4.2 Basic definitions . . . . . . . . . . . . . . . . . . . . . . . 12

1.5 Is automation worthwhile? . . . . . . . . . . . . . . . . . . . . . 13

2 Cfengine past and present 142.1 Fundamental Concepts . . . . . . . . . . . . . . . . . . . . . . . . 16

2.1.1 Promises, Actions and Operations . . . . . . . . . . . . . 172.1.2 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . 182.1.3 Classes and Declarations: From One to Many Hosts . . . 192.1.4 Voluntary Cooperation . . . . . . . . . . . . . . . . . . . . 192.1.5 Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.2 Cfengine Components . . . . . . . . . . . . . . . . . . . . . . . . 21

3 ITIL past and present 233.1 ITIL and its versions . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.1.1 ITIL: Important Foundations . . . . . . . . . . . . . . . . 243.1.2 ITILv2: Service Support and Service Delivery . . . . . . . 243.1.3 ITILv3: Management from the Service Life Cycle Per-

spective . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.2 Foundations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.2.1 Cfengine in ITIL clothes? . . . . . . . . . . . . . . . . . . 263.2.2 ITIL’s idea of processes . . . . . . . . . . . . . . . . . . . . 263.2.3 Service Strategy . . . . . . . . . . . . . . . . . . . . . . . . 28

1

CONTENTS

3.2.4 Service Design . . . . . . . . . . . . . . . . . . . . . . . . 283.2.5 Service Operation . . . . . . . . . . . . . . . . . . . . . . . 283.2.6 Continual Service Improvement . . . . . . . . . . . . . . 28

3.3 Tool Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4 A meeting of mind-sets 304.1 Which ITIL processes apply to cfengine? . . . . . . . . . . . . . . 30

4.1.1 Configuration Management (CM) . . . . . . . . . . . . . 324.1.2 Asset Management, what is it used for? . . . . . . . . . . 334.1.3 Change management . . . . . . . . . . . . . . . . . . . . . 344.1.4 Release management . . . . . . . . . . . . . . . . . . . . . 354.1.5 Incident and problem management . . . . . . . . . . . . 354.1.6 Service Level Management (SLM) . . . . . . . . . . . . . 35

4.2 ITIL terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

5 Using cfengine to implement ITIL objectives 375.1 Infrastructure or management? . . . . . . . . . . . . . . . . . . . 375.2 How can cfengine or promises help? . . . . . . . . . . . . . . . . 38

5.2.1 Traditions . . . . . . . . . . . . . . . . . . . . . . . . . . . 385.2.2 Modelling of policy . . . . . . . . . . . . . . . . . . . . . . 385.2.3 Uniformity . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

5.3 What is maintenance? . . . . . . . . . . . . . . . . . . . . . . . . 405.4 Incident Management vs Maintenance . . . . . . . . . . . . . . . 405.5 Rollout and installation . . . . . . . . . . . . . . . . . . . . . . . . 43

5.5.1 Customize by constant/fixed “gold” overlay . . . . . . . 455.5.2 Overlay an expandable template . . . . . . . . . . . . . . 455.5.3 Direct customization . . . . . . . . . . . . . . . . . . . . . 46

5.6 Change Management . . . . . . . . . . . . . . . . . . . . . . . . . 465.6.1 Software packaging . . . . . . . . . . . . . . . . . . . . . 485.6.2 Rollback or “remediation” . . . . . . . . . . . . . . . . . . 525.6.3 Monitoring change: files . . . . . . . . . . . . . . . . . . . 53

5.7 Release Management . . . . . . . . . . . . . . . . . . . . . . . . . 585.8 Configuration version control and rollback . . . . . . . . . . . . 59

5.8.1 Delegating responsibility to multiple groups . . . . . . . 605.9 Availability and Capacity Management . . . . . . . . . . . . . . 61

6 Summary 676.1 How we wrote this document . . . . . . . . . . . . . . . . . . . . 67

6.1.1 ITIL concepts for authoring . . . . . . . . . . . . . . . . . 68

2

CONTENTS

6.1.2 Promise concepts - voluntary cooperation . . . . . . . . . 686.1.3 Best of both worlds . . . . . . . . . . . . . . . . . . . . . . 69

6.2 Road-map for adoption . . . . . . . . . . . . . . . . . . . . . . . . 70

A ITIL terminology 71A.1 Active Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . 71A.2 Availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71A.3 Alert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72A.4 Audit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72A.5 Baseline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72A.6 Benchmark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73A.7 Capability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73A.8 Change record . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73A.9 Chronological Analysis . . . . . . . . . . . . . . . . . . . . . . . . 74A.10 Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74A.11 Configuration Item (CI) . . . . . . . . . . . . . . . . . . . . . . . 74A.12 Configuration Management Database (CMDB) . . . . . . . . . . 75A.13 Document . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75A.14 Emergency Change . . . . . . . . . . . . . . . . . . . . . . . . . . 75A.15 Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75A.16 Event . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76A.17 Exception . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76A.18 Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76A.19 Incident . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77A.20 Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77A.21 Passive Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . 77A.22 Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77A.23 Proactive Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . 78A.24 Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78A.25 Promise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78A.26 Reactive Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . 79A.27 Record . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79A.28 Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79A.29 Remediation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80A.30 Repair . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80A.31 Release . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80A.32 Request for Change . . . . . . . . . . . . . . . . . . . . . . . . . . 81A.33 Resilience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

3

CONTENTS

A.34 Restoration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82A.35 Role . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82A.36 Service desk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82A.37 Service Level Agreement . . . . . . . . . . . . . . . . . . . . . . . 82A.38 Service Management . . . . . . . . . . . . . . . . . . . . . . . . . 83A.39 Warning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

4

Chapter 1

Introduction

1.1 Business alignment

The goal of most IT installations is to work as a support infrastructure for someother primary activity, such as the running of a business or other organization.Even if the primary activity is the design of computer systems, or the writing ofsoftware, the supporting infrastructure is a tool whose management is in prin-ciple separate from the main business goals. As organizations become larger,the management of the IT system and other ancillary activities frequently be-come isolated from “front line” activities.

IT infrastructure is an enabler, so it is important to ensure that it succeeds inthis task. How do we do this? This document is about how to make cfengine-management best support primary business or organizational processes.

We write this document in the light to two trends: the demotion of systemadministration as a job description and the rise of service oriented thinking toreplace it, along with monolithic design philosophy of systems. Service ori-entation is not so much a technological innovation as it is a different kind ofsocial structure. It is a move away from hierarchy as the main model of orga-nization, toward generalized network structure. In computer parlance, serviceorientation is essentially a peer to peer structure. There are no automatic kingsor commanders in chief, only peers who need help from other peers. If suchkey positions arise, they emerge naturally by necessity, not by presumption.

For example, in the 1960s factory work in the United Kingdom was orga-nized hierarchically with powerful unions attending to a dutiful “separation ofconcerns”, much like an idealized object oriented system. To build a ship, onewould have to ask the management to ask panel producers for panels, then

5

1.2. ITIL

when they were finished they would send the message back up the hierarchyso that management would schedule the welders to arrive, then the paintersand so on. Much delay and inefficiency was caused by this organizational bu-reaucratic structure.

Although this behaviour persists to a lesser extent, today we use more di-rect communication between the parts that need to connect and so save muchtime and overhead. This service oriented thinking can be applied to computingservices, their organization and even the support of those computing services.The service model can be applied at all levels.

In the late 1980s it was realized that a service oriented view of managementcould profitably be formalized so as to be of benefit to all organizations. Thisbegan the Information Technology Infrastructure Library, building on the ex-perience of leaders in government and industry, including organizations suchas the British Broadcasting Corporation, the office of Government Commerceand others.

1.2 ITIL

The IT Infrastructure Library (ITIL) has emerged as a de-facto set of ideas aboutservice delivery. It is not based on any theoretical model or design criteria. It israther a set of self-proclaimed best practices compiled by representatives fromgovernment and industry. As such its claims can be discussed, but we shallnot do so here. We shall refer to ITIL because it has become a popular set ofguidelines for all manner of IT organizations, and because it promotes the ideaof IT-business alignment.

ITIL was an important source of concepts and processes documented in thefollowing British and ISO standards:

• BS 15000

• ISO/IEC 20000 (successor of BS 15000)

ITIL now encompasses various books and courses and has its own qualifi-cation scheme allowing for a certification of Service Managers or IT staff.

The key concepts of ITIL include service and process orientation, and serviceorientation is an important model for system organization because it can en-compass everything from the monolithic hierarchical systems of yesteryear tomodern day peer to peer architectures which better mirror a free-market eco-nomic business interaction. It can be applied to computer-provided services

6

1.3. BUSINESS PROCESSES AND GOALS

(e.g. web services, or even configuration operations like cfengine) or it canbe applied to human services and operations such as help desks and support.This makes it an important centre-piece in the discussion.

ITIL has its own particular terminology for discussing service related mat-ters. To relate these to the use of a technology such as cfengine we need to un-derstand the words and how they are used. ITIL uses many terms and phrasesin a different way to system administrators1.

The verb “to manage” originally meant “to cope”. Only more recently hasstrategic thinking changed it into a transitive verb: something that we do tosystems, like driving a car, or flying a plane.

Today the term “management” signifies the introduction of a bureaucraticlevel of governance, to control and verify the workings of a system. The termi-nology this has come about mainly because the people who wrote ITIL live inthat kind of world and understand things through these eyes. Ironically, com-puter engineers now speak of “self-management” and “autonomics” to recoverthe original idea of systems that can cope.

In this document we have two principal aims:

• To explain a number of patterns for using cfengine to allow systems tocope with business needs.

• To demystify ITIL for technicians and engineers who do not naturallyrespond to business-speak, relating cfengine’s capabilities (both the tech-nical aspects that are well known and the non-technical aspects of instru-mentation and reporting that are less well known) to the goals of ITIL.

1.3 Business processes and goals

What do we need to make a business? Do we need a demand for a “prod-uct”, a workflow to implement it, a supply (chain) mechanism for selling it toa market? It turns out that the service abstraction is a paradigm that fits allenterprises without too much shoe-horning.

Businesses have probably many goals in their grand designs: they havehigh level visions, notions of secure and best practices, sometimes even ethicalpolicies. All of these can be couched in the language of promises to behave insome way.

1It sometimes seems that ITIL is about beatific buzzwords and antagonizing acronyms. Justwhen you thought you knew what an RFC is, think again!

7


Now, we can ask: what does it mean to align an IT infrastructure to thisbusiness goal to provide S? First, for IT systems to have any impact on thebusiness goal at all, the business must rely on the IT system in some way. Thiscould either be directly, in the manner of an e-commerce web-site, or it mightbe indirectly, for instance by providing drawing and modelling software in anarchitect’s office. In either case there is a workflow in which an IT system playsan intermediary role in the workflow process.

In fact, it does not matter whether this is an IT system, a human being ora steam-powered engine. What is key is that there is a technology playing anintermediate role in the performance of a service. We can display this as theworkflow diagram shown by the dotted lines in fig 1.1. The business B wouldlike to provide service S to its customer C; in actuality this requires the help ofintermediary I .

Figure 1.1: Inserting an intermediate agent into a business process. The dotted linesshow a work flow path. The arc shows a promise the business would like to make to theend customer – but promise theory says that it cannot if it does not have direct contact.

Promise theory has several implications, and one of them is that an agentcannot promise something with confidence to an agent it is not directly in contactwith. This is because agents can only vouch for their own behaviour. Theycannot promise what an intermediate agent would do. This has implicationsfor the business.

Suppose a business want to make a promise to its customer, but knowsthat it must rely on intermediaries (the IT department for example) to do so.Promise theory tells us that the business representative making the promiserequires promises from every intermediate agent in the chain, and each of theagents in that chain require promises from down the chain too.

It is beyond the scope of this document to explain all of those promises.What cfengine allows a business to do is to automate many of those promises– or make them autonomic (self-managing).

8


1.3.1 Teams, control structures and collaboration

Humans are poor at reliable, repetitive work but they are infinitely superiorat creative work and decision-making. Modern theory on success in busi-ness rejects the classic views of management with militarized or bureaucra-tized chains of command and control[1, 2] in favour of more human-creativestructures. Creative and adaptive workflow requires high level of decentral-ization and autonomy, while at the same time protecting the core values of theorganization.

Team work is a key element in decentralized organization – both for hu-mans and computers. IT departments are often organized in this way, for in-stance. Teams do not exist because they maximize production of every individ-ual, nor do they make an organization more predictable or controllable. Theyexist because humans need continual motivation and emotional support – andindirectly this sustains workflow and adds creativity to a business. One oftenoverlooks the team-aspect of coping when considering computer management,in favour of hierarchical design. Cfengine does not force us into hierarchicalsystems however, so we should not discard the smaller team idea too soon.

Figure 1.2: Hierarchy or team? Hierarchy has long traditions, but modernthinking favours teams.

Cfengine is complex enough for it to make sense to delegate responsibilityfor different issues. An organization will generally consist of many groupsand teams already, each with their own special needs and each craving its ownautonomy. Cfengine and promise theory were designed for precisely this kindof environment. Cfengine allows cooperation and sharing without allowingcentral managers to ride roughshod over local needs.

Teams thrive by discussion and interaction within the framework of a pol-icy or vision, allowing variation and arriving at a consensus when necessary.

9


Success in a team depends on a combination of abilities working together notundermining one another. Conflicts in the promises made by team membersreveal design problems in the group. An analysis of promises (cfengine’s modelof collaboration) is a significant tool for understanding and enabling businesses.

M. Belbin a researcher in teamwork has identified nine abilities or roles(kinds of promise) to be played in a team collaboration:

1. Plant – a creative “ideas” person who solves problems.

2. Shaper – this is a dynamic member of the team who thrives on pressureand has the drive and courage to overcome obstacles.

3. Specialist – someone who brings specialist knowledge to the group.

4. Implementer – a practical thinker who is rooted in reality and can turnideas into practice (who sometimes frustrates more imaginative high fly-ing visionaries).

5. Resource Investigator – an enabler, or someone who knows where to findthe help the team needs regardless of whether the help is physical, finan-cial or human. This person is good at networking.

6. Chairman/Co-ordinator – an arbitrator who makes sure that everyonegets their say and can contribute.

7. Monitor-Evaluator – is a dispassionate, discerning member who can judgeprogress and achievement accurately during the process.

8. Team Worker – someone concerned with the team’s inter-personal rela-tionships and who is sensitive to the atmosphere of the group.

9. Completer/Finisher – someone critical and analytical who looks after thedetails of presentation and spots potential flaws and gaps. The completeris a quality control person.

His model has little room for technical workflow arguments. It is entirelyconcerned with the creative process. This is probably significant. We shouldask ourselves: how can we use the freedom to organize into specialized teamsto maximize human creativity, while passing hard work over to machines.Solving this problem is what cfengine is about.

10

1.4. PROMISES

1.4 Promises

1.4.1 A theory

ITIL has no theory to back it up, so we have to look elsewhere for a motiva-tion of its practices. Promise theory is an attempt to do just this for a serviceoriented model in which peers make promises to one another. So it ought towork for ITIL also. The advantage of promise theory is that it helps us to seehow cfengine can be used, because promises provide a simple picture of howcfengine works.

THINK OF CFENGINE AS A GENERAL TOOL FOR AUTOMATICALLY

MAKING SURE THAT PROMISES ARE KEPT.

The popular service concept fails to capture one thing very clearly, namelythe distinction between making a promise and keeping a promise. A serviceimplies that something will be provided but it does not specify when.

Suppose we ask a security company to protect our assets. The companymight promise to deploy guards, or alarm technology, or it could simply promisethat you will be safe without explaining how the promise will be kept. Thepromise does not necessarily imply any action required to maintain this stateof safety, but we still pay the company for the service to keep this promiseanyway. Trust plays an important role, of course.

Promise theory helps us to understand services in all forms by forcing usto think carefully about the concept of autonomy. Autonomy implies severalthings: for instance, privacy of information, independence of decision and re-sponsibility for one’s own behaviour. The concept of autonomy is like a filterthat makes us think carefully about things that we often take for granted. It is agood discipline, forcing us to confront what we think we know about systems.

The agents of promises are humans, computers or any entity that can be as-sociated with a promise even if by association with its owner or designer. Theyare said to be autonomous if they cannot be forced to make any promises abouttheir behaviour by an outside agent. A useful principle for understanding sys-tems is the maximal separation of concerns and promises help us to separateindependent issues.

Separation of concerns is only half the story however. Promises are alsoabout describing how the parts of a system work together, just as in team-work.Promises provide the glue that allows completely autonomous parts to forman organization. We are not allowed to think about “control” or “command”,

11

1.4. PROMISES

only about voluntary cooperation. Keep these ideas in mind when reading thisdocument.

1.4.2 Basic definitions

We can use the language of promises to make clearer definitions.

• Service: a promise to act or provide a resource. The promise is made froma ‘server’ agent S to one or more external agents which we call the clients{C}.

• Agreement: a mutual acceptance of knowledge by two agents (“the agentsagree”). The knowledge that is agreed to is called the body of the agree-ment. Note that the term “agreement” is sometimes used incorrectly tomean “contract”. Agreement is often signified by signing the body, orsome equivalent declaration. In promise theory an agreement is a pair of(−) use-promises between two parties to acknowledge acceptance of theagreement body.

• Contract: a bilateral bundle of proposed promises between two agents,intended to serve as the body of an agreement.

• Service Level Agreement An agreement between two parties whose bodydescribes a contract for service delivery and consumption.

Service Level Agreements (SLA) are now a well-known part of the customer-business scenario. How are promises different from Service Level Agreements(SLA)? Promises are more primitive than agreements. Agreements bind twoparties to a collection of bilateral decisions that have been made in advance.An agreement implies an existing infrastructure on which to agree. A promiseon the other hand is an entirely autonomous statement about agents’ behaviour(ad hoc). Showing only the promises in a system does not imply any agreementbetween the parties, only indications about their likely behaviours.

In other words, seeing the promises that have been made, an external ob-server could calculate effective service levels that have been promised withoutany agreement taking place. Promises are therefore more fundamental thanagreements to the predictability of the system.

12

1.5. IS AUTOMATION WORTHWHILE?

1.5 Is automation worthwhile?

Process automation is an investment which has its own cost. The benefits arenot merely saved manpower but improved consistency or certainty of process.Automation provides an automatic quality assurance.

A simple argument against automation goes like this: if I can fix it in fiveminutes then it is not worth automating, unless the automation takes less timethan that.

The argument is simplistic. Before dismissing automation, one should askquestions like this:

• How many of these five minute periods occur in the long run?

• How much time was needed to diagnose each of them?

• Could the problems have been avoided altogether by proactive mainte-nance?

One of the benefits of automation is in prevention, another is in documentinginstitutional learning by codifying the processes required for the avoidance ofincidents. A tool like cfengine which separates intention (promises) from ac-tion makes this kind of documentation highly readable and allows the learningto penetrate the workflow processes directly.

13

Chapter 2

Cfengine past and present

As technology becomes more sophisticated,the cost of introducing variations declines.

—Alvin Toffler, Future Shock, 1970

Cfengine is a free software package for automating the installation andmaintenance of networked computers. The project began in 1993 and it hasbeen in widespread use since 1995. Cfengine is available for all major Unix andUnix-like operating systems, and it will also run under NT-derived Windowsoperating systems via the Cygwin Unix-compatibility environment/libraries.

Cfengine scales easily from a single host to tens of thousands of hosts. Asof this writing, the largest installations we know of regulate around 20,000 ma-chines under a common administration. Cfengine can manage many aspectsof system configuration and maintenance, including the following:

• Performing post-installation tasks such as configuring the network inter-face.

• Editing system configuration files and other files.

• Creating symbolic links.

• Checking and correcting file permissions and ownership.

• Deleting unwanted files.

• Compressing selected files.

• Distributing files within a network.

14

• Automatically mount NFS file systems.

• Verifying the presence and integrity of important files and file systems.

• Executing commands and scripts.

• Applying security-related patches and similar system corrections.

• Managing system server processes.

Cfengine’s purpose is to implement policy-based configuration manage-ment. In practical terms, this means that cfengine greatly simplifies the tasks ofsystem configuration and maintenance. For example, to customize a particularsystem, it is no longer necessary to write a program which performs each re-quired action in a procedural language like Perl or your favorite shell. Instead,you write a much simpler policy description that documents how you wantyour hosts to be configured. The cfengine software determines what needs tobe done in terms of implementation and/or remediation from this specifica-tion. Such policy descriptions are also used to ensure that the system remainsconfigured as the system administrator wishes over time.

Here is a brief example of such a policy description which we’ve annotated:

Policy Example 1 (A first example)control: General directives: here, we define a list variable.

tmpdirs = ( tmp:scratch:scratch2 )

files: File ownership and protection specifications.

/usr/local/bin owner=root group=bin mode=755 action=fixall

copy: Copy files on/to the local system.

solaris:: Applies only to Solaris systems.

/config/pam/solaris server=pammaster dest=/etc/pam.d

linux:: Applies only to Linux systems.

/config/pam/common-auth server=pammaster

dest=/etc/pam.d/common-auth

tidy: Manage temporary scratch directories.

/${tmpdirs} include=* age=7 recurse=inf

This simple configuration is divided into four stanzas, each introduced bya colon-terminated keyword, specifically control:, files:, copy: and tidy:. The

15

2.1. FUNDAMENTAL CONCEPTS

control stanza defines a list of directories which we’ve named tmpdirs whichwe’ll use later (in the tidy stanza).

The files stanza specifies that all of the files in the directory /usr/local/binshould be owned by user root and group bin and have the file mode 755. Whencfengine runs with this configuration description it will correct any ownershipand/or permissions which deviate from these specifications. Thus, this stanzaserves to implement a policy about the proper ownerships and permissions forthe executables in the local binaries directory.

The copy stanza prescribes different configurations for Linux and Solarissystems. On Solaris systems, files in /etc/pam.d will be updated with those inthe directory /config/pam/solaris on a master server when the latter are newer.On Linux systems, only the file /etc/pam.d/common-auth is updated from thePAM master configuration because the Linux systems in question use the PAMinclude file mechanism to propagate this file’s stacks to all of the PAM-enabledservices. Note, however, that both of these specifications implement the sameunderlying system configuration maintenance policy: update the relevant PAMconfiguration files from the master server if necessary.

The final, tidy stanza illustrates the use of implicit looping. The single di-rective in the example applies to each of the directories in the tmpdirs list. Foreach directory, cfengine will delete all items in the directory or any of its sub-directories which have not been accessed in seven days (including ones wherethe filename begins with a period). Like the other directives in this sample con-figuration file, this stanza implements a policy: items in temporary directorieswhich have not been used within a week will be deleted.

All cfengine configuration descriptions are variations on these an similarthemes, albeit more elaborate ones. Before turning to more details about thetechnical aspects of using cfengine, a brief consideration of the most importantunderlying and guiding theoretical concepts is in order.

2.1 Fundamental Concepts

As we’ve stated, cfengine operates on hosts in order to bring their configura-tions in line with the specified policies. We need to define some terms.

Key Concept 1 (Host) A host is a single computer that runs an operating systemlike Unix, Linux or Windows. We will sometimes talk about machines too, and a hostcan also be a virtual machine supported by an environment VMWare or Xen/Linux.

16


Key Concept 2 (Policy) This is a specification of what we want a host to be like,or how we want it to behave. A policy is essentially a piece of documentation thatdescribes technical details and characteristics. Cfengine implements policies that arespecified via directives of the sort we just considered.

Key Concept 3 (Configuration) The configuration of a host is the actual state of itsresources, e.g. the permissions and contents of files, the inventory of software installed,etc. It is the ‘state of affairs’ on a particular host at a given time.

What are we aiming for with cfengine? The answer is: policy conformant con-figuration. We want to formulate a specification of not just one host, but usuallymany, including how they all interact, perhaps to solve a business problem;then we want to leave the details, implementation and maintenance to a robotagent: cfagent.

Humans are good at understanding input and thinking up solutions butthey not very reliable at implementation: doing things reliably. Machines andsoftware agents are good at carrying out tasks reliably, but are not good atunderstanding or finding actual solutions. With cfengine, you let the distinctparts of your human-computer organization concentrate on what they are eachgood at.

Cfengine can also produce reports about systems for monitoring the perfor-mance and compliance with policies. This is an important aspect of business in-tegration as service providers want to know whether they are delivering whatthey have promised, and whether their money has been spent wisely.

2.1.1 Promises, Actions and Operations

Cfengine’s philosophy fits quite well with the service oriented approach tocomputing.

A cfengine policy can be thought of as a list of promises which the systemmakes to some auditor about its configuration. Most of the these promises in-volve the possibility of change to make a host fulfills its policy promises. Wecall such changes actions or operations. As you probably already guessed, theauditor in this scenario is part of cfengine itself. Cfagent is also the mechanicor surgeon that performs the operations on the system, if it does not meet itspromises. By describing its operation in this manner, we can think of config-uration management as a service that is provided, a service that is intimatelyconnected with monitoring and maintenance, and which can be “bought” ondemand without necessarily subordinating a system to a central authority.

17


Key Concept 4 (Operation) A unit of change is called an operation. Cfengine dealswith changes to a system, and operations are embedded into the basic sentences of acfengine policy. They tell us how policy constrains a host, in other words, how we willprevent a host from running away.

For example, here is a promise about the attributes of a file:

files:

/etc/passwd mode=a+r,go-w owner=root group=root action=fixall

There are implicit operations (actions) in this declaration: specifically, the op-erations that will change the attributes if/when they do not conform to thisspecification.

2.1.2 Convergence

A key property of cfengine is convergence. This is an important characteristicthat distinguishes it from general computer languages. It is a property thathelps to prevent systems from diverging: running away in an uncontrollablefashion.

Key Concept 5 (Convergence) An operation is convergent if it always brings theconfiguration of a host closer to its ideal, policy-conformant state and has no effect ifthe host is already in that state. We can summarize this in functional terms by thefollowing meta-rules:

cfengine(incorrect state)→ correct state

cfengine(correct state)→ correct state

We shall sometimes call a “correct state” a “healthy state,” using the metaphor that abadly configured host is suffering from a kind of sickness.

Here is an example used during the editing of an ASCII file:

editfiles:

...

AppendIfNoSuchLine " Important configuration line"

This operation tells cfengine to append the given text to the end of a file, onlyif it is not already there. The policy-conformant configuration is therefore that

18


the line is present, and once that is achieved nothing more will be done. Wesay that the operation AppendIfNoSuchLine is convergent.

Don’t underestimate the value of convergence. It provides you with sta-bility. Because cfengine’s language interface strongly discourages you fromdoing anything non-convergent, it also help to prevent mistakes. The price isthat you will have to learn to think in a convergent way—and that is new formost people who come to cfengine for the first time.

2.1.3 Classes and Declarations: From One to Many Hosts

One of the features that makes cfengine policies readable is the ability to hideaway all of the complex decision-making that needs to be performed by theagent. To realize this ambition, cfengine uses a declarative language to expresspolicy.

A declarative language is simply a structured list of sentences (in the caseof cfengine, it is a list of policy promises). It is stated in no particular order; itdescribes a final goal that is to be achieved. The details of how one gets thereare left implicit: to be evaluated and implemented by the engine that interpretsthe specification. This is in contrast to procedural or imperative languages, suchas shell or Perl which micro-manage every step along the way.

In an imperative language, one focuses on the procedure. In a declarativelanguage, one focuses on the intention, or the presumed result.

One example of this is the use of classes in cfengine. Classes are a wayof making decisions, without writing many “if-then-else” clauses. A class isan identified which has the value “true” when a particular test is true. It is aBoolean variable; if you like it caches the result of an “if” test. The benefit ofclasses is that all of the testing can be hidden away in the bowels of cfengine,and only the results need be visible if or when they are needed.

Key Concept 6 (Classes) A class is a way of slicing up and mapping out the complexenvironment of one or more hosts into regions that can then be referred to by a symbolor name. They describe scope: where something is to be constrained.

For example, the class debian is true if and only if cfagent is running on ahost that has Debian Linux as its operating system.

2.1.4 Voluntary Cooperation

It is a fundamental property of cfengine components that every host retains itsindividual autonomy. A host can always opt out of cfengine-based governance

19


if its administrator wants to. This principle leads to a fundamental design andimplementation decision:

Key Concept 7 (Autonomy) No cfengine component is capable of receiving infor-mation that it has not explicitly asked for itself, nor can it be advised or commanded byan outside agent without requesting such advice.

It is important to understand what this means. It does not mean that cen-tralized control of hosts cannot be achieved. Centralized control is the waythat most users choose to use cfengine. Indeed, all you have to do to achievecentralized control is to make a policy decision for all your hosts to fetch policyspecifications from a central authority.

Autonomy does mean that if your environment has some small groups orsub-cultures with special needs, it is possible for them to retain their specialidentity. No one claiming to be their self-appointed authority can ride roughshod over their local decisions.

Where does policy come from then? Each host works from a policy specificationthat cfengine expects to find in a local directory (usually /var/cfengine/inputs ona Unix-like host). If you want your host to be controlled from some centralmanager or authority, then your policy must contain bootstrapping specifica-tions that say: “it is my decision that I should download and follow the policy specifi-cation located at the central manager.”

Each host can turn this policy decision off at any time. This is a key part ofthe cfengine security model.

2.1.5 Scalability

Cfengine’s scalability is at least as good as any other system, because it allowsfor maximal distribution of workload.

Key Concept 8 (Scalable distributed action) Each host is responsible for carry-ing out checks and maintenance on/for itself, based on its local copy of policy.

This does not mean that you are immune from making bad decisions. Forexample, network services can always be a bottleneck if you ask 10,000 hoststo fetch something from one place at the same time.

The fact that each cfengine agent keeps a local copy of policy (regardless ofwhether it was written locally or inherited from a central authority) means thatcfengine will continue to function even if network communications are down.

20

2.2. CFENGINE COMPONENTS

2.2 Cfengine Components

The cfengine software consists of a number of components: separate programsthat work together (see Figure 2.1).1

The components of cfengine are:

• cfagent: Interprets policy promises and implements them in a convergentmanner. The agent can use data generated by the statistical monitoringengine cfenvd and it can fetch data from cfservd running on local orremote hosts.

• cfexecd: Is a scheduler and wrapper which executes cfagent and logsits output (optionally sending a summary via email). It can be run indaemon (standalone) mode, or it can be run from cron on a Unix-likesystem.

• cfservd: A server daemon that serves file data. It can also be configuredto start cfagent immediately on receipt of a connection from cfrun. Noactual data can be passed to this daemon.

• cfrun: A helper application that polls hosts and asks them to run cfagentif they agree.

• cfenvd: A statistical state monitor that collects statistics about resourceusage on each host for anomaly detection purposes. The information ismade available to the agent in the form of cfengine classes so that theagent can check for and respond to anomalies dynamically.

• cfkey: Generates public-private key pairs on a host. You normally runthis program only once, as part of the cfengine software installation pro-cess.

• cfshow: Displays the cfagent database contents in ASCII format, shouldyou ever become interested in its internal memory.

• cfenvgraph: Dumps cfenvd’s statistical database contents in a form thatcan be used to plot graphs showing the normal behavior of a host in itsenvironment.

1The components differ between version 1 and version 2. We shall only discuss cfengine 2 here,as cfengine version 1 is no longer supported, and you are strongly advised to use version 2. Inaddition, Cfengine version 3 is being developed at the time of writing, but this will take a numberof years before it can fully replace version 2. It will incorporate the state of the art in Network andSystem Administration research, building on all the lessons learned from versions 1 and 2.

21

2.2. CFENGINE COMPONENTS

Figure 2.1: Cfengine Components and the Connections Between Them

Figure 2.1 illustrates the relationships among cfengine components on dif-ferent hosts. On a given system, cfagent may be started by the cfexecd daemon;the latter also handles logging during cfagent runs. In addition, operationssuch as file copying between hosts are initiated by cfagent on the local system,and they rely on the cfservd daemon on the remote system to obtain remotedata.

22

Chapter 3

ITIL past and present

Experience, without theory,teaches management nothing about

what to do to improve qualityand competitive position.

—William Edwards Deming

The IT Infrastructure Library (ITIL) is a collection of books, in which “bestpractices” for IT Service Management (ITSM) are described. Today, ITIL canbe seen as a de-facto standard in the discipline of ITSM, for which it providesguidelines by its current core titles Service Strategy, Service Design, ServiceTransition, Service Operation and Continual Service Improvement. ITIL fol-lows the principle of process-oriented management of IT services.

In effect, the responsibilities for specific IT management decisions can beshared between different organizational units as the management processesspan the entire IT organization independent from its organizational partition.Whether this means a centralization or decentralization of IT management inthe end, depends on the concrete instances of ITIL processes in the respectivescenario.

3.1 ITIL and its versions

ITIL has its roots in the early 1990s, and since then was subject to numerousimprovements and enhancements. Today, the most popular release of ITILis given by the books of ITIL version 2 (often referred to as ITILv2), whilethe British OGC (Office of Government Commerce), owner and publisher of

23

3.1. ITIL AND ITS VERSIONS

ITIL, is currently promoting ITIL version 3 (ITILv3) under the device ”‘ITILReloaded”’.

It is important to understand that ITILv3 is not just an improved version ofthe ITILv2 books, but rather comes with a completely renewed structure, newsets of processes and a different scope with respect to the issue of IT strategies,IT-business-alignment and continual improvement. That is why, in the follow-ing, we run through the basics of both versions, highlighting commonalitiesand differences.

3.1.1 ITIL: Important Foundations

It is the paradigm of process-oriented IT Service Management that ITIL is basedon. In addition, ITIL uses the Deming quality circle as a model for continualquality improvement, where quality both relates to the provided IT services aswell as the management processes deployed to manage these services. Contin-ual improvement as to ITIL means to follow the method of Plan-Do-Check-Act:

• Plan: Plan the provision of high-quality IT services, set up the requiredmanagement processes for the delivery and support of these services,define measurable goals and the course of action in order to fulfill them.

• Do: Put the plans into action.

• Check: Measure all relevant performance indicators, and quantify theachieved quality compared to the quality objectives. Check for potentialsof improvement.

• Act: In response to the measured quality, start activities for future im-provements. This step leads into the Plan phase again.

3.1.2 ITILv2: Service Support and Service Delivery

Although ITILv3 has been released during the summer of the year 2007, it is itspredecessor that has achieved great acceptance amongst IT service providersall over the world. And due to the fact that the International ISO/IEC 20000standard has emerged from the basic principles and processes coming fromITILv2, it is this version experiencing the biggest distribution and popularity.

The core modules of ITILv2 are the books entitled Service Support and Ser-vice Delivery. While the Service Support processes (e.g. Incident Management,Change Management) aim at supporting day-to-day IT service operation, the

24

3.2. FOUNDATIONS

Service Delivery processes (e.g. Service Level Management, Capacity Manage-ment, Financial Management) are supposed to cover IT service planning likeresource and quality planning, as well as strategies for customer relationshipsor dealing with unpredictable situations.

3.1.3 ITILv3: Management from the Service Life Cycle Per-spective

In 2007, ITILv2 has been replaced by its successor ITILv3, aimed at covering theentire service life cycle from a management perspective and striving for a moresubstantiated idea of IT business alignment. Many of the ITILv2 processes andideas have been recycled and extended by various additional processes andprinciples. The five service life cycle stages accordant to ITILv3 are:

1. Service Strategy: Common strategies and principles for customer-oriented,business-driven service delivery and management

2. Service Design: Principles and processes for the stage of designing newor changed IT services

3. Service Transition: Principles and processes to ensure quality-orientedimplementation of new or changed services into the operational environ-ment

4. Service Operation: Principles and processes for supporting service oper-ation

5. Continual Service Improvement: Methods for planning and achievingservice improvements at regular intervals

3.2 Foundations

Why service and process orientation? What is ITIL trying to do? As we men-tioned in the introduction, the ‘military’ control view of human organizationfell from favour in business research in the 1980s and service oriented auton-omy was identified as a new paradigm for levelling organizations – getting ridof deep hierarchies that hinder communication and open up communicationdirectly.

If one is cynical, one can interpret the signs of CEOs nervously trying toput back some of the military thinking into process management – with defi-nitions of authority and chains of responsibility, but these chains are short and

25

3.2. FOUNDATIONS

whenever ITIL says “committee”, promise theory would say that all we needis a single agent (a human or computer) and the internal details of it don’t mat-ter. We should probably not think too literally about ITIL’s choice of words,which after all were born from a particular kind of corporate culture and willnot appeal to everyone.

If we look at ITIL through the eyeglass of a hierarchical organization, someof its procedures could be seen as restrictive, throttling scalable freedoms. Wedo not believe that this is their intention. Rather ITIL’s guidelines try to makea predictable and reliable face for business and IT operations so that customersfeel confidence, without choking the creative process that lies behind the de-sign of new services.

3.2.1 Cfengine in ITIL clothes?

Cfengine users are interested in the ability to manage, i.e. cope with systemconfiguration in a way that enables a business or other organization to do itswork effectively. They don’t want reams of human management because this iswhat cfengine is supposed to remove. To be able to use ITIL to help in this task,we have to first think of the process of setting up as a number of services. Whatservices are these? We have to think a little sideways to see the relationship.

• Service - providing a sensible configuration policy, responding to discov-ered problems or the needs of end-users.

• Change - an edit of the configuration policy, with appropriate qualitycontrols.

• Release - a new configuration policy, consisting of many changes. A newversion of cfengine? This could be a major and disruptive change so itshould be planned carefully.

• Capacity - having enough resources for cfservd to answer all queries ina network. Having enough people to support the processes of deployingand following cfengine’s progress.

You should keep this kind of thinking in mind, and train yourself to seeevery part of a task in “ITIL clothes”.

3.2.2 ITIL’s idea of processes

The following management processes are in scope of ITILv3:

26

3.2. FOUNDATIONS

• Service Level Management: Management of Service Level Agreements(Alas), i.e. service level and quality promises.

• Service Catalogue Management: deciding on the services that will be pro-vided and how they are advertised to users.

• Capacity Management: Planning and provision of adequate business,service and resource capacities.

• Availability Management: Resource provision and monitoring of service,from a customer viewpoint.

• Continuity Management: Development of strategies for dealing with po-tential disasters.

• Information Security Management: Ensuring a minimum level of infor-mation security throughout the IT organization.

• Supplier Management: Maintaining supplier relationships.

• Transition Planning and Support: Ensuring that new or changed servicesare deployed into the operational environment with the minimal impacton existing services

• Asset and Configuration Management: Management of IT assets andConfiguration Items.

• Release Management: Planning, building, testing and rolling out hard-ware and software configurations.

• Change Management: Assessment of current state, authorization andscheduling of improvements.

• Service Validation and Testing: ensuring that services meet their specifi-cations.

• Knowledge Management: organizing and integrating experience and method-ology for future reference.

• Incident Management: responding to deviations from acceptable service.

• Event Management: Efficient handling of service requests and complaints.

• Problem Management: Problem identification by trend analysis of inci-dents.

27

3.2. FOUNDATIONS

• Request Fulfillment: Fulfilling customer service requests.

• Access Management: Management of access rights to information, ser-vices and resources.

3.2.3 Service Strategy

Service strategy is about deciding what services you want to formalize. Inother words, what parts of your system administration tasks can you wrap inprocedural formalities to ensure that they are carried out most excellently?

3.2.4 Service Design

Service design is about deciding what will be delivered, when it will be deliv-ered, how quickly the service will respond to the needs of its clients etc. Thisstage is probably something of a mental barrier to those who are not used toservice-oriented thinking.

3.2.5 Service Operation

How shall we support service operation? What resources do we need to pro-vide, both human and computer? Can we be certain of having these resourcesat all times, or is there resource sharing taking place? If services are chainedinto “supply chains”, remember that each link of the chain is a possible delay,and a possible misunderstanding. Successfully running services can be morecomplex at task than we expect, and this is why it is useful to formalize themin an ITIL fashion.

3.2.6 Continual Service Improvement

Continual improvement is quite self-explanatory. We are obviously interestedin learning from our mistakes and improving the quality and efficiency bywhich we respond to service requests. But it is necessary to think carefullyabout when and where to introduce this aspect of management. How oftenshould we revise out plans and change procedures? If this is too often, theoverhead of managing the quality becomes one of the main barriers to qualityitself! Continual has to mean regular on a time-scale that is representative forthe service being provided, e.g. reviews once per week, once per month? Noone can tell you about your needs. You have to decide this from local needs.

28

3.3. TOOL SUPPORT

3.3 Tool Support

In the field of tool support for IT Service Management accordant to ITIL, var-ious white papers and studies have been published. In addition, there are pa-pers available from BMC, HP, IBM and other vendors that describe specific(commercial) solutions. Generally, the market for tools is growing rapidly,since ITIL increasingly gains attention especially in large and medium-size en-terprises. Today, it is already hard to keep track of the variety of functionalitiesdifferent tools provide. This makes it even more difficult to approach this topicin a way satisfactory to the entire researchers’, vendors’ and practitioners’ com-munity.

That is why this document follows a different approach: Instead of thinkingof ever new tools and computer-aided solutions for ITIL-compliant IT ServiceManagement, this book analyses how the existing and well-established tech-nologies used for traditional systems administration can fit into an ITIL-drivenIT management environment, and it guides potential practitioners in integrat-ing a respective tool suite – namely cfengine – with ITIL and its processes.

To avoid any misunderstanding: We do not argue that cfengine – originallyinvented for configuring distributed hosts – may be deployed as a comprehen-sive solution for automating ITIL, but what we believe is cfengine and its morerecent innovations can bridge the gap between the technology of distributed sys-tems management and business-driven IT Service Management. To make thecase we must show:

1. How ITIL terminology relates to the terminology of cfengine and henceto a traditional system administrator’s language, and

2. Which parts (processes and activities) of ITIL can be (partially) supportedby cfengine, and how.

These are the main goals of the subsequent chapters.

29

Chapter 4

A meeting of mind-sets

To summarize the results of the previous chapters, it can be said that the goalsof ITIL and the purpose of cfengine are quite different: ITIL gives recommenda-tory guidance in process- and serviceoriented IT Service Management, whilecfengine provides a powerful solution framework for a variety of common net-work and systems administration tasks. In other words:

1. The scope of ITIL is much broader than traditional systems administra-tion, but: Portions of systems administration and configuration manage-ment tasks take place in the context of certain ITIL processes.

2. Cfengine was not designed to replace ITSM tools like trouble ticket sys-tems (TTS), workflow management or CMDBs, but: in the more tech-nical areas of IT Service Management, cfengine is able to support ITILprocesses in their activities.

The goal of this document is to give an overview on how cfengine can beused to support selected IT Service Management tasks according to ITIL.

4.1 Which ITIL processes apply to cfengine?

In version 2, ITIL divides itself into service support[4] and service delivery[5].For instance, service support might mean having a number of cfengine expertswho can diagnose problems, or who have sufficient knowledge about cfengineto solve problems using the software. It could also mean having appropriatetools and mechanisms in place to carry out the tasks. Service delivery is abouthow these people make their knowledge available through formal processes,

30

4.1. WHICH ITIL PROCESSES APPLY TO CFENGINE?

Figure 4.1: Scope of ITIL vs. scope of cfengine

Figure 4.2: FCAPS mapped to ITIL

how available are they and how much work can they cope with? Cfengineenables a few persons to perform a lot of work very cheaply, but we shouldnot forget to track our performance and quality for the process of continualimprovement.

Service support is composed of a number of issues:

• Incident management: collecting and dealing with incidents.

• Problem management: root cause analysis and designing long term coun-termeasures.

• Configuration management: maintaining information about hardware

31


and software and their interrelationships.

• Change management: implementing major sequenced changes in the in-frastructure.

• Release management: planning and implementing major “product” changes.

Although the difference between change management and release manage-ment is not completely clear in ITIL, we can think of a release as a changein the nature of the service, while change management deals with alterationspossibly still within the scope of the same release. Thus is release is a moremajor change.

Service delivery, on the other hand, is dissected as follows:

• Service Level Management

• Problem management

• Configuration management

• Change management

• Release management

These issues are somewhat clearer once we understand the usage of the terms“problem”, “service” and “configuration”. Once again, it is important that wedon’t mix up configuration management in ITIL with configuration manage-ment as used in a Unix parlance.

The notion of system administration in the sense of Unix does not exist inITIL. In the world of business, reinvented through the eyes of ITIL’s mentors,system administration and all its functions are wrapped in a model of serviceprovision.

4.1.1 Configuration Management (CM)

Perhaps the most obvious example is the term configuration management.

Definition 1 (Configuration Management) The process (and life-cycle) responsi-ble for maintaining information about configuration items (CI) required to deliver anIT service, including their relationships.

32


As we see, this is comparable to our intuitive idea of “asset management”,but with “relationships” between the items included. ITIL also defines “AssetManagement” as “a process responsible for tracking and reporting the valueof financially valuable assets” and is a component of ITIL Configuration Man-agement.

In the cfengine world, configuration management involves planning, de-ciding, implementing (“base-lining”) and verifying (“auditing”) the inventory.It also involves maintaining the security and privacy of the data, so that onlyauthorized changes can be made and private assets are not made public.

In this document we shall try not to mix the ITIL concept with the more pro-saic system administration notion of a configuration which includes the currentstate of software configuration on the individual computers and routers in anetwork.

Since cfengine is a completely distributed system that deals with individualdevices on a one-by-one basis, we must interpret this asset management at twolevels:

• The local assets of an individual device at the level of virtual structuresand containers within it: files, attributes, software packages, virtual ma-chines, processes etc. This is the traditional domain of automation forcfengine’s autonomic agent.

• The collective assets of a network of such devices.

Since a single host can be thought of as a network of assets connected throughvirtual pathways, it really isn’t such a huge leap to see the whole network ina similar light. This is especially true when many of the basic resources arealready shared objects, such as shared storage.

4.1.2 Asset Management, what is it used for?

Why bother to collect an inventory of this kind? Is it bureaucracy gone mad,or do we need it for insurance purposes? Both of these things are of coursepossibilities.

The data in an ITIL Configuration Management Database (CMDB) can beused for planning the future and for knowing how to respond to incidents, inother words for service level management (SLM) and for capacity planning.An organization needs to know what resources it has to know whether its candeliver on its promises. Moreover, for finance and insurance it is clearly asound policy to have a database of assets.

33


For continuity management, risk analysis and redundancy assessment weneed to know how much equipment is in use and how much can be broughtin at a moment’s notice to solve a business problem. These are a few of thereasons why we need to keep track of assets.

4.1.3 Change management

If we make changes to a technical installation, or even a business process, thiscan affect the service that customers experience.

Major changes to service delivery are often written into service level agree-ments since they could result in major disruptions.

Details of changes need to be known by a help-desk and service personnel.The decision to make a change is more than a single person should usually.

It requires consultation at different levels of process. An advisory board forchanges takes on this role, whether it is an informal board that communicateselectronically or a physical committee “with six or more legs and no brain”.

Make a note! 1 (Change management vs convergence) We should be especiallycareful here to decide what we mean by change. ITIL assumes a traditional model ofchange management that cfengine does not need. ITIL’s ideas apply to the managementof cfengine’s configuration, not the way in which cfengine carries out its work.

In traditional idea of change management you start by “base-lining” a system,or establishing a known starting configuration. Then you assume that things onlychange when you actively implement a change, such as “rolling out a new version” orcommitting a release. This, of course, is very optimistic.

In most cases all kinds of things change beyond our control. Items are stolen, thingsget broken by accident and external circumstances conspire to confound the order wewould like to preserve. The idea that only authorized people make changes is nonsense.

Cfengine takes a different view. It thinks that changes in circumstances are partof the picture, as well as changes in inventory and releases. It deals with the idea of“convergence”. In this way of thinking, the configuration details might be changingat random in a quite unpredictable way, and it is our job to continuously monitor andrepair general dilapidation. Rather than assuming a constant state in between changes,cfengine assumes a constant “ideal state” or goal to be achieved between changes. Animportant thing to realize about including changes of external circumstances is thatyou cannot “roll back” circumstances to an earlier state – they are beyond our control.

34


4.1.4 Release management

A release is a collection of authorized changes to a system. One part of ChangeManagement is therefore Release Management. A release is generally a largerumbrella under which many smaller changes are made. It is major change.Changes are assembled into releases and then they are rolled out.

In fact release management, as described by ITIL, has nothing to do withchange management. It is rather about the management of designing, testingand scheduling the release, i.e. everything to do with the release process exceptthe explicit implementation of it. Deployment or rollout describe the physicalmovement of configuration items as part of a release process.

4.1.5 Incident and problem management

ITIL distinguishes between incidents and problems. An incident is an event thatmight be problematic, but in general would observe incidents over some lengthof time and then diagnose problems based on this experience.

Definition 2 (Incident) An event or occurrence that demands a response.

One goal of cfengine is to plan pro-actively to handle incidents automati-cally, thus taking them off the list of things to worry about.

Definition 3 (Problem) A pattern of consequence arising from certain incidents thatis detrimental to the system. It is often a negative trend that needs to be addressed.

Changes can introduce new incidents.An integrated way to make the tracking of cause and effect easier is clearly

helpful. If we are the cause of our own problems, we are in trouble!

4.1.6 Service Level Management (SLM)

Also loosely referred to as Quality of Service. This is the process of makingsure that Service Level Promises are kept, or Service Level Agreements (SLA)are adhered to. We must assess the impact of changes on the ability to deliveron promises.

35

4.2. ITIL TERMINOLOGY

4.2 ITIL terminology

Like many other areas of wishful standardization, ITIL elevates itself to a stateof importance by using multitude of acronyms and specialized terms. Not allof these are as intuitive as one might hope for and many simply seem beyondnecessity. However, to understand the writing, we need to know a few of themand also understand how they differ from similar terms in system administra-tion and the world of cfengine. In the appendix, we list with comments aboutthe most important of these terms. Figure 4.3 shows a scatter-plot of theseterms.

Figure 4.3: ITIL terminology

36

Chapter 5

Using cfengine to implementITIL objectives

How does cfengine fit into the management of a service organization? Thereare several ways:

• It offers a rapid detection and repair of faults that help to avoid formalincidents.

• It simplifies the deployment (release) of services.

• Allows resources to be understood and planned better.

These properties allow for greater predictability of system services and thereforethey contribute to customer confidence.

5.1 Infrastructure or management?

Any tool for assisting with change management lies somewhere between ITIL’snotion of change management and the infrastructure itself. It must essentiallybe part of both (see fig. 5.1). This applies to cfengine too.

Cfengine can manage itself as well as other resources: itself, its software,its policy and the resulting plans for the configuration of the system. In otherwords, cfengine is itself part of the infrastructure that we might change.

37

5.2. HOW CAN CFENGINE OR PROMISES HELP?

Figure 5.1: Cfengine is both infrastructure and a part responsible for infrastruc-ture.

5.2 How can cfengine or promises help?

5.2.1 Traditions

Traditional methods of managing IT infrastructure involve working from crisisto crisis – waiting for ‘incidents’ to occur and then initiating fire suppressionresponses or, if there is time, proactive changes. With cfengine, these can becombined and made into a management service, with continuous service qual-ity.

Cfengine can assist with:

1. Maintenance assurance.

2. Reporting for auditing.

3. Change management.

4. Security verification.

Promise theory comes with a couple of principles:

1. Separation of concerns.

2. Fundamental attention to autonomy of parts.

5.2.2 Modelling of policy

Other approaches to discussing organization talk about the separation of con-cerns, so why is promise theory special? Object Orientation (OO) is an obvious

38

5.2. HOW CAN CFENGINE OR PROMISES HELP?

example. Promise theory is in fact quite different to object orientation (whichis a misnomer).

Object orientation asks users to model abstract classes (roles) long beforeactual objects with these properties exist. It does not provide a way to modelthe instantiated objects that later belong to those classes. It is mainly a formof information structure modelling. Object orientation models only abstractpatterns, not concrete organizations.

Promise theory on the other hand considers only actual existing objects(which it calls agents) and makes no presumptions that any two of these willbe similar. Any patterns that might emerge can be exploited, but they are notimposed at the outset. Promise theory’s insistence on autonomy of agents isan extreme viewpoint from which any other can be built (just as atoms are abasic building block from which any substance can be built) so there is no lossof generality by making this assumption.

In other words, OO is a design methodology with a philosophy, whereaspromises are a model for an arbitrary existing system.

5.2.3 Uniformity

The traditional production-line paradigm for management of IT systems in-volves reducing the number of variations – often simply making all systemsidentical for mass-production. However, as quoted at the beginning of chapter2, the purpose of advanced technology is to enable us to cope with variation.Cfengine makes managing variations simple. Some organizations might sim-ply want to have a uniform configuration on all their hardware, but what doesthis mean if the basic hardware is different?

In cfengine we understand that “similar” should be based on how systemsbehave not what their disk images look like. Two systems that make the samepromises ought to behave in the same way, if the promises are at a high enoughlevel. But what if two different operating systems promised to never have afile called /etc/passwd? A windows machine would not care too much, buta Unix system would be paralyzed.

Promises and system configuration are related: configuration affects be-haviour and behaviour is what we promise. Clearly we cannot expect veryhigh level promises to be simply translated into configurations however. Thefact that we make promises about system configuration says nothing certainabout the promise that results from changing it. That depends on many otherfactors. Thus we must be careful to think about what a promise means.

39

5.3. WHAT IS MAINTENANCE?

Make a note! 2 (Fundamental assumption) The basic assumption of configura-tion management is that a specific configuration determines the resulting behaviourof a system. This assumption is completely unproven, and is sometimes obviouslyfalse. At best there is a correlation between configuration and behaviour. This is whatmakes IT management challenging. The things we can change do not necessarily giveus the control we would like.

5.3 What is maintenance?

Maintenance is a process that ITIL does not formally spend any time on explic-itly, but it is central to real-world quality control.

Imagine that you decide to paint your house. Release 1 is going to be whiteand it is going to last for 6 years. Then release 2 is going to be pink. We manageour painting service and produce release 1 with all of the care and quality weexpect. Job done? No.

It would be wrong for us to assume that the house will stay this fine colourfor 6 years. Wind, rain and sunshine will spoil the paint over time and we shallneed to touch up and even repaint certain areas in white to maintain release1 for the full six years. Then when it is time for release 2, the same kind ofmaintenance will be required for that too.

Unless we read between the lines, it would seem that ITIL’s answer to thisis to wait for a crisis to take place (an incident). We then mobilize some kindof response team. But how serious an incident do we require and what kindof incident response is required? A graffiti artist? A lightening strike? A birdanoints the paint-work? Cfengine is like the gardener who patrols the groundsconstantly plucking weeds, before the flower beds are overrun. Call it contin-ual improvement if you like: the important thing is that the process your bepro-active and not too expensive.

Maintenance is necessary because we do not control all of the changes thattake place in a system. There is always some kind of “weather” that we haveto work against. Cfengine is about this process of Maintenance. We call it“convergence” to the ideal state, where the ideal state is the specified versionrelease. Keep this in mind as you read about ITIL change management.

5.4 Incident Management vs Maintenance

Cfengine employs the idea of continual maintenance (we paint the fence on aregular basis to protect it). ITIL, on the other hand, moves from release to re-

40

5.4. INCIDENT MANAGEMENT VS MAINTENANCE

lease (this year we paint the fence red, next year green) and does not recognizethe effect of gradual entropic decay of state (the fence’s colour fades graduallydue to the harsh environment). Instead ITIL deals with events (graffiti and tag-ging of the fence) which must be corrected. While it is true that these incidentsare maintenance, the repairs are more costly to initiate if they occur as excep-tional events than if we are used to repainting the fence on a regular basis.

Detect

Filter

Log Event Correlate Events Incident ManagementProblem ManagementChange Management

Trigger

Auto Response

Alert

Human Intervention

Review Actions

Informational Exception

Warning

Activity

Activity according to ITIL V3

supported by cfengine

Figure 5.2: An exemplary Event Management process on the basis of ITIL V3

Figures 5.2 and 5.3 show ITIL processes for the handling of events and in-cidents. They show the aspects of dealing with events that are mainly hu-man oriented, and those events in shaded boxes that can be automated usingcfengine.

In fig 5.2 we see that there must be a basic monitor at the top of the process

41

5.4. INCIDENT MANAGEMENT VS MAINTENANCE

chain which is responsible for observing events. This fits well with the viewof promise theory in which a neutral observer is required to measure the stateof different component agents in the system. Not all events are necessarilyrelevant or interesting so we can filter these based on a policy. Cfengine’s eventmonitors come from two sources: cfagent (for monitoring the state of promiseswhich are being managed - e.g. the proverbial colour of the fence) and cfenvd(for passively monitoring the environment - e.g. the brightness of the sunshineor the amount of rainfall impacting on the fence).

• Cfengine filters events through its class interface. All events observed incfengine are classified and made available to the environment.

• Cfengine logs events by routing messages to email or to syslog (by askinginform=true or syslog=true or audit=true).

• The daemon cfenvd auto-correlates events. The tool cfbrainwill crosscorrelate events, further classifying the outcomes as part of the environ-ment.

• Events can be triggered by attaching promises to event-driven classes inthe cfagent configuration. e.g.

processes:

www_in_high_anomaly::

‘‘apache’’ signal=term

alerts:

www_in_high_anomaly::

ShowState(www.in)

For more devastating incidents, we can arrange for more information tobe output. An incident is really only an event of some special signif-icance. Diagnosing an incident requires either human intervention orpre-cached insight on the part of the promises we make. If we can makea specific promise then the diagnosis that this promise has not been keptcan easily be turned into a specific repair. For example,

42

5.5. ROLLOUT AND INSTALLATION

We might note a sudden burst of smtp traffic, or a sudden decrease infree disk space. These events can be anticipated if one knows a benigncause, such as email was shutdown for maintenance, or the host is a newmail-server that has never seen traffic before.

Categorise

Prioritise

Major IncidentHandling

Investigate & Diagnose Request Fulfilment

Solve & Recover

Major Incident Service Request

Activity

Activity according to ITIL V3

supported by cfengine

Identify

Figure 5.3: An exemplary Incident Management process on the basis of ITILV3

5.5 Rollout and installation

When setting up hosts, ITIL actually makes a techical recommendation. This isunusual for ITIL as it generally does not get mixed up in the details of manage-ment, only the processes. ITIL recommends “base-lining” systems from a goldserver, i.e. a system that is thought to be “perfect” enough to act as a model forall other systems. Once a server has been base-lined from the golden image,various customizations can be made relative to this known state. ITIL sees thisas a way of achieving consistency.

We believe that ITIL exceeds its technical competence in making this a rec-ommendation. True enough, this has traditionally been a way of performing arollout, but the approach has been superceded by better technology. The gold

43


server approach is not the recommended cfengine way. In fact a golden-imageapproach wastes a fundamental flexibility that cfengine offers, namely the pos-sibility to allow variations (see the quote by Alvin Toffler at the start of chapter2).

When we baseline a system from a gold-server, we are planning to make allhosts basically the same. However, this is neither necessary nor cost-saving ifyou use cfengine.

Cfengine places no restrictions on the approach used to roll out hosts. Ratherthan requiring you to start from a known state, it allows you to specify the fi-nal state for any initial state. This means you can migrate hosts gradually to apolicy state without having to reinstall them. We can consider the end result ofa cfengine policy process to be “the release”. In cfengine this is equivalent to asufficiently comprehensive configuration policy.

The message here is that cfengine allows you to achieve predictable resultswithout the need for a gold server. Nevertheless, it is helpful to begin a systembased on a reliable substrate. It is like making a good sandwich: it helps tohave a perfect piece of bread to build on, but it’s what you put on top that ismost important. You just need to know what you are starting with, and thenmost things can be fixed to satisfaction. We recommend:

• Start with some kind of standard image to start (a predictable substrate).

It is does not necessarily matter what it is as long as it behaves pre-dictably. e.g. install from known DVD, or install from net-boot or evenfrom a gold server.

• Customize the basic working system using cfengine. There are two pos-sible approaches to this:

i) Copy constant “gold” overlays or patches into place from a trustedsource to customize the system.

– Add more operating system packages.

– Insert special files (config, data etc).

– Run post-processing scripts.

ii) Edit system directly with cfengine

– Documented automatically by cfagent promises.

– Can always customize after that too (phase 3)!

44


5.5.1 Customize by constant/fixed “gold” overlay

The first alternative is to install a fixed patch to a system from a known gold-server. The basic pattern is this:

Policy Example 2 copy:

/source/file

dest=/dest/file

server=gold_server

In this example, we simply install a new file into a known location. This is thesimplest way of customizing a host, but it lacks flexibility.

5.5.2 Overlay an expandable template

A more sophisticated approach is to download a parameterized template froma repository or gold server. This template contains context dependent variablesthat can be expanded in situ by cfengine. There are two stages to this: first wecopy the template to a temporary location, then we edit the final file location,insert the template and expand its variables. By following this procedure, theresult satisfies cfengine’s principles of convergence.

Policy Example 3 copy:

/source/file

dest=/tmp/file

server=gold_server

editfiles:

/dest/file

EmptyEntireFilePlease

InsertFile ‘‘/tmp/file’’

ExpandVariables

45

5.6. CHANGE MANAGEMENT

5.5.3 Direct customization

A final approach to customization is to apply direct editing operations to im-plement the required customization.

Policy Example 4 editfiles:

/dest/file

ReplaceAll ‘‘X’’ With ‘‘Y’’

AppendIfNoSuchLine ‘‘ABC’’

This approach is useful for small corrections, that require unsophisticated edit-ing, but it becomes quickly cumbersome for more complex tasks.

5.6 Change Management

ITIL proposes that there should be an integrated approach to change and con-figuration management. Clearly changes to a system result in new configura-tions. However, changes can also be unplanned involuntary faults (ITIL dis-cusses these as incidents). See fig. 5.4.

ITIL does not want unplanned changes, however we know that they hap-pen. Cfengine does not elevate deviations from policy to the level of an in-cident normally, it simply fixes problems immediately. However, we do notalway have enough information about changes to allow cfengine to make re-pairs, so we need a way of monitoring for unexpected change.

Change management in cfengine is a subtle topic, because cfengine doesnot fully subscribe to the model of change that ITIL does. In cfengine’s viewof the world, all changes are changes no matter how or why they occur. InITIL’s world view, there are planned changes, there are releases and there are“incidents”.

ITIL therefore distinguishes between planned and unplanned changes thataffect service delivery. Cfengine on the other hand cares only about whatpromises have been made about the system and whether or not these havebeen kept. See figs. 5.5-5.7 to see how cfengine fits into ITIL thinking.

Cfengine can detect changes because it effectively performs a constant auditof the system’s promises. We should understand cfengine’s change detectionin two ways: changes that impact the performance or quality of services

46


Figure 5.4: There are two approaches to change. You can try to climb a moun-tain (like snakes/chutes and ladders), following a special route from the base-line, or you can roll into a fixed-point like the base of a valley. ITIL believes inthe first of these approaches. Cfengine takes the latter.

• with respect to the quality of the system configuration service (i.e. cfengine’sservice)

• with respect to the quality of services supported by the system configu-rations (e.g. other services like web services)

To cfengine, changes only matter if they impact the promises that have beencodified as policy. Even events that cfengine calls “anomalies”, detected andclassified continuously, are only considered interesting if policy determinesthem to be restricted, thus every single state change can be considered either“within tolerances” (insignificant) or “out of tolerance” (significant). ITIL isonly a heuristic set of guidelines and is not technically sophisticated enough tobe able to make this kind of distinction.

Let’s make an approximate mapping between ITIL concepts and cfenginechange and the comment critically on it.

47


ITIL CfengineIncident Promise not keptChange Configuration version/content updateRelease ?

Table 5.1: Comparison of meanings for change or deviation terminology

5.6.1 Software packaging

ITIL considers releases to be entire integrated systems that are versioned. Mostoperating systems work at a smaller level of granularity than this. Softwareversion control using package managers to version individual software pack-ages. Although such package managers resolve dependencies, they do notversion entire conglomerates of software. Software comes in large packagesfor two main reasons:

• Operating system installation (all or nothing).

• Functional role adaptation (specialized workstation).

Different organizational roles require different ITIL services to support them,and hence different software to deliver them.

Cfengine deals with versioned data management in two ways:

• File copying from master source (by date-stamp or checksum).

• Package installation and verification (using local package managers).

Package managers handle the installation and update of packages easily, butthey do not always add institutional adaptational control in a way that can betied into a classification of hosts in an organization’s network. Cfengine canuse its classification of hosts to customize further. We simply attach relevantclusters of packages to different classes of host to ensure that specific worksta-tions are properly adapted to service their tasks.

Not all software comes in operating system (vendor/provider) approvedpackages, but cfengine can also handle software that is zipped, tar-ed or bun-dled in any other manner.

The following example policies illustrates some of the copy rule type’s ca-pabilities, including some of the options we just considered:

48


Policy Example 5 (File Copying Policies)control:

DefaultCopyType = ( mtime )

SplayTime = ( 15 )

sourcehost = ( source.cfengine.org )

copy:

# Copy dat/doc files if not too big

/usr/local/data dest=/archive/data

include=*.dat include=*.doc exclude=test.*recurse=inf backup=false size<500m

# Retrieve configuration file from master

/depot/hosts.deny server=${sourcehost}dest=/etc/hosts.deny owner=root group=0 mode=644

backup=off force=on timestamps=keep

# Transmit shadow password file encrypted

/depot/shadow server=${sourcehost} dest=/etc/shadow

owner=0 group=0 mode=600 encrypt=true

The first rule specifies that .dat and .doc files within the /usr/local/data di-rectory tree be copied to /archive/data, provided that the source files have beenmodified more recently then their counterpart in the target directory and thatthey are smaller than 500 MB. In addition, files having the name test are alsoexcluded. Existing files will be overwritten without being saved.

The second rule unconditionally replaces the local /etc/hosts.deny file withone from the system source.cfengine.org, retaining the timestamps from the sourcefile. This rule also specifies the ownership and mode for the target file.

The third rule is similar to the second one, retrieving another file from thesame remote system. In this case, however, the file will be copied only whenthe remote file is more recent than the local copy. When the file is copied, theprevious version will be retained, and the file contents will be encrypted at itis transmitted across the network.

Cfengine can also automate software package management and installa-tion. Policies for these items are specified in the packages stanza. Here aresome examples:

49


Policy Example 6 (Package Management)control: # Define package manager & install command

linux:: DefaultPkgMgr = ( rpm )

redhat:: RPMInstallCommand = ( "/usr/sbin/up2date %s" )

suse:: RPMInstallCommand = ( "/usr/sbin/yast2 -i %s" )

packages:

nagios version=2.4 cmp=ge

pstree action=install

The settings in the control section specify the package management soft-ware that is in use as well as the command used to install a software package.These directives illustrate the use of operating system-based classes withinpolicies for defining a different installation command for different Linux dis-tributions.

In the packages stanza, the first rule checks whether Nagios is installed.A warning will be generated if the package is not present at all or if the in-stalled version is earlier than version 2.4. The second rule checks for the pstreepackage, and installs it if it is not present on the system.

The following parameterized method-promise installs its first argument inthe prefixed location given by the second argument. It collects the tar file, un-packs it, configures and compiles it, then tidies its files.

Policy Example 7 (Build software from source)#

# Build GNU sources and install

#

control:

actionsequence = ( methods )

methods:

InstallTar(cfengine-2.1.0b7,/local/gnu)

action=cf.install

returnvars=null

returnclasses=null

server=localhost

50


We must install the method in the trusted modules directory (normally/var/cfengine/modules or WORKDIR/modules).

Policy Example 8 (Method cf.install)#

# cf.install

#

control:

MethodName = ( InstallTar )

MethodParameters = ( filename prefix )

path = ( /usr/local/gnu/bin )

TrustedWorkDir = ( /tmp )

TrustedSources = ( /depot )

TrustedSourceServer = ( localhost )

actionsequence = ( copy editfiles shellcommands tidy )

copy:

$(TrustedSources)/$(filename).tar.gz

dest=$(TrustedWorkDir)/$(filename).tar.gz

server=$(TrustedSourceServer)

shellcommands:

"$(path)/tar zxf $(filename).tar.gz" chdir=$(TrustedWorkDir)

"$(TrustedWorkDir)/$(filename)/configure --prefix=$(prefix)"

chdir=$(TrustedWorkDir)/$(filename)

define=okay

okay::

"$(path)/make"

chdir=$(TrustedWorkDir)/$(filename)

tidy:

$(TrustedWorkDir) pattern=$(filename) r=inf rmdirs=true age=0

51


5.6.2 Rollback or “remediation”

The ability to go back to an earlier “release” or state is often referred to asrollback. ITIL calls it remediation. The notion is closely connected with processmanagement, and both ITIL and traditional management techniques value thisby default. It is assumed practice.

Cfengine does not encourage rollback however. Why not? Because it re-quired destructive intervention and cfengine’s model is based on on-the-flychange. To go back to a previous state, a system must be stopped, reinitial-ized (perhaps from backup) and restarted. This requires service to stop and allrun-time state is lost.

Cfengine’s approach to this would be to revert policy to its previous state.The system would then roll into its desired state (as if going forwards). Noth-ing would be restored from separate backup media (see fig. 5.8).

The difference here is the assumption about how and when changes occur.A sequence of step by step transitions sounds innocent, but it is unstable to un-expected changes. ITIL and many other change management models assumesthat no unauthorized changes occur between releases. If they do, they are han-dled as incidents. By separating releases from incidental changes, we get ledinto thinking that we can in fact revert by destructive intervention.

In fact reversion has inevitable consequences. We must make a choice.

• Revert entire state, except we lose runtime state.

In this case, we essential revert the entire system from a back up of itssaved state. (Some virtual machines can save runtime state for resump-tion, but this can become stale, e.g. for network connections, as it is mean-ingless to rollback part of a dialogue.) This operation results in catas-trophic change.

• Revert managed state, on the fly.

This is cfengine’s default behaviour. To go back to a previous state, sim-ply change the policy back to a previous version. This will not necessar-ily revert the entire state of the system, but everything that is covered bypolicy will be reverted.

Some tools allow you to rollback without reverting from backup. Cfenginedisallows this on principle1, as it requires human judgement to perform cor-

1In fact cfengine retains the necessary information to allow managed changes to be reversed tosome extent. The point however is that one can only guarantee the content of managed objects, sosimply reversing a change will not necessarily take us back to the same state – so we consider thisto be fundamentally too risky.

52


rectly. It cannot be automated without uncertain results.

5.6.3 Monitoring change: files

Cfengine can monitor absolute and relative states of a system. A simple wayto measure relative change is to use a database of checksums.

Policy Example 9 control:

ChecksumUpdates = ( true )

ChecksumPurge = ( true )

files:

/my/important/files

recurse=inf

checksum=md5

owner=root,daemon

group=0,1,4

Change monitoring is about detecting when stored data, or other measur-able aspects of a computer system change. A change detection system is notnormally concerned with the reason for a change, but if you are monitoringfor change then we shall take it for granted somehow that you are expecting tofind changes that you didn’t plan for yourself.

Cryptographic checksums

The most important bulk of information on a computer is its filesystem data.Change detection for filesystems uses a technique made famous in the programTripwire, which collects a “snapshot” of the system in the form of a databaseof file checksums (cryptographic hashes) and permissions and rechecked thesystem against this database at regular intervals. Tripwire examines files, andlooks for change in their contents or their attributes. This is a very simple (evensimplistic) view of change. If a legitimate change is made to the system, such asystem responds to this as a potential threat. Databases must then be altered,or rebuilt.

53


Hashes or Digests

A cryptographic hash (also called a digest) is an algorithm that reads (digests)a file and computes a single number (the hash value) that is based on the con-tents. If so much as a single bit in the file changes then the value of the hashwill change. You can compute hash values manually, for example:

host$ openssl md5 cfengine-2.2.4a.tar.gz

MD5(cfengine-2.2.4a.tar.gz)= 6d2b31c4814354c65cbf780522ba6661

There are several kinds of hash function. The most common ones are MD5and SHA1. Recently both of the algorithms that create these hashes have beensuperceded by the newer SHA2. Cfengine supports MD5 and SHA1 and it willsupport SHA2 as soon as the OpenSSL library supports an interface to the newalgorithm.

Computing hashes

Cfengine has adopted something like the Tripwire model, but with a few pro-visoes. Tripwire assumes that all change is unauthorized (it makes an incidentout of any observed change). Cfengine cannot reasonably take this viewpoint.Cfengine expects systems to change dynamically, so it allows users to define apolicy for what changes are considered to be okay.

Integrity checks on files whose contents are supposed to be static are a goodway to detect tampering with the system, from whatever source. RunningMD5 or SHA1 checksums of files regularly provides us with a way of deter-mining even the smallest changes to file contents.

To use the checksum based change detection we first ask cfengine to collectMD5 hash data for specified files. Here is an excerpt from a cfengine config-uration program that would check the /usr/local filesystem for file changes.Note that it excludes files such as log files that we therefore allow to change(log files are supposed to change):

54


Policy Example 10 files:

/usr/local owner=root,bin,man

mode=o-w # check permissions separately

r=inf

checksum=best # switch on change detection

action=warnall

ignore=logs

exclude=*.log

# repeat for other files or directories

The first time we run this, cfengine collects data and treats all files as “un-changed”. It builds a database of the checksums. The next time the rule ischecked, cfagent recomputes the checksums and compares the new values tothe ‘reference’ values stored in the database. If no change has occurred, the twoshould match. If they differ, then the file as changed and a warning is issued.

cf:nexus: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

cf:nexus: SECURITY ALERT: Checksum (md5) for /etc/passwd changed!

cf:nexus: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

This message is designed to be visible. If you do not want the embracingrows of ‘!’ characters, then this control directive turns them off:

control:

Exclamation = ( off )

The next question to ask is: what happens if the change that was detected isactually okay (which is almost always the case in practice). If you activate thisoption:

control:

ChecksumUpdates = ( on )

Then, as soon as a change has been detected, the database is updated and themessage will not be repeated. If this is set to off, which is the default, thenwarning messages will be printed each time the rule is checked.

55


New files are automatically detected, as they are not in the database. If youwant to be notified when files are deleted, then set the option

control:

ChecksumPurge = ( on )

Tamperproof data and distributed monitoring

Message digests are supposed to be unbreakable, tamperproof technologies,but of course everything can be broken by a sufficiently determined attacker.Suppose someone wanted to edit a file and alter the cfengine checksum databaseto cover their tracks. If they had broken into your system, this is potentiallyeasy to do. How can we detect whether this has happened or not?

A simple solution to this problem is to use another checksum-based op-eration to copy the database to a completely different host. By using a copyoperation based on a checksum value, we can also remotely detect a change inthe checksum database itself.

Consider the following code:

56


Policy Example 11 # Neighbourhood watch

control:

allpeers = (

SelectPartitionNeighbours(/path/hostlist,#,random,4)

)

copy:

/var/cfengine/checksum digests.db

dest=/safekeep/chkdb $(this)

type=checksum

server=$(allpeers)

inform=true # warn of copy

backup=timestamp

define=tampering

alert:

tampering::

’Digest tampering detected on a peer’

It works by building a list of neighbours for each host. The function

SelectPartitionNeighbours

can be used for this. Using a file which contains a list of all hosts runningcfengine (e.g. the cfrun.hosts file), we create a list of hosts to copy databasesfrom. Each host in the network therefore takes on the responsibility to watchover its neighbours.

The copy rule attempts to copy the database to some file in a safekeepingdirectory. We label the destination file with $(this) which becomes the nameof the server from which the file was collected. Finally, we backup any success-ful copies using a timestamp to retain a complete record of all changes on theremote host. Each time a change is detected, a copy will be kept of the old. Therule contains triggers to issue alerts and warnings also just to make sure themessage will be heard.

In theory, all four neighbours should signal this change. If an attacker haddetailed knowledge of the system, he or she might be able to subvert one or

57

5.7. RELEASE MANAGEMENT

two of these before the change was detected, but it is unlikely that all fourcould be covered up. At any rate, this approach maximizes the chances ofchange detection.

Finally, in order to make this copy, you must, of course, grant access to thedatabase in cfservd.conf.

Policy Example 12 # cfservd.conf

admit:

any::

/var/cfengine/checksum digests.db mydomain.tld

Let us now consider what happens if an attacker changes a file an edits thechecksum database. Each of the four hosts that has been designated a neigh-bour will attempt to update their own copy of the database. If the database hasbeen tampered with, they will detect a change in the md5 checksums of theremote copy versus the original. The file will therefore be copied.

It is not a big problem that others have a copy of your checksum database.They cannot see the contents of your files from this. A possibly greater problemis that this configuration will unleash an avalanche of messages if a change isdetected. This makes messages visible at least.

5.7 Release Management

Release management, as defined by ITIL (section 9 of BS15000-2), is a manage-ment function rather than a machine implementation operation. It includes allaspects of designing, planning and scheduling changes, but does not includethe implementation.

Cfengine can help with the final stages of software release management,namely deployment of software components and configuration. However, thebulk of this item concerns the human process of decision-making.

• Creating a schedule and policy for releases.

• Acquiring of completing the components for release.

• Assigning roles for responsibility.

• Labelling release items uniquely for tracking.

58

5.8. CONFIGURATION VERSION CONTROL AND ROLLBACK

• Documentation updates.

• Testing prior to release.

Cfengine is not a tool for assisting in this kind of process. Some kind of processplanning tool and revision control system could work for this.

Cfengine has features that can be considered in the context of this work,however.

• packages

• files

• copy

ITIL frequently works with the idea of a baseline state. While cfengine hasno problem working with the idea of a baseline configuration, it is designed toexceed this assumption of maintenance from release to release. ITIL does notadequately address the need for on-the-fly maintenance; it only models large-jump changes, not error corrections. Cfengine, on the other hand, makes nodistinction between a large and a small change, thus users of cfengine mustmake a value judgement about the nature of such changes.

5.8 Configuration version control and rollback

Cfengine does not provide specific tools for versioning configuration specifica-tions. It is rather recommended to use a tool such as subversion for this.

Subversion maintains its own revision numbers that are not visible to cfenginehowever. It is useful to be able to refer to version numbers also in cfengine.From software release 2.2.2 a version string can be added to files as follows:

control:

cfinputs_version = ( 1.2.3 )

Auditing = ( on )

This defines the version number of a set of configuration files which is referredto in auditing and error messages.

When cfengine saves the current version of a file that it is modifying or re-placing, by default such files are given a new extension and remain within thesame directory which they were encountered. Alternatively, one can specify a

59

5.8. CONFIGURATION VERSION CONTROL AND ROLLBACK

repository directory to which such files can be moved instead. The repositorylocation is specified in the control section:

control:

Repository = ( /var/spool/cfengine )

Files moved to the repository are given names reflecting their full path, withslashes replaced by underscore characters. For some, this creates a cleareroverview of the changes that have occurred.

The repository is used by disable, editfiles, links, and copy rule types;copy and disable allow you to override repository use or to specify an alternaterepository directory via their repository option.

You should never edit the production version of a policy directly, but ratheredit a separate development area and publish the changes once tested. The ITILchange management process is applicable to this human change (much morerelevant that the machine changes made by cfengine itself.).

5.8.1 Delegating responsibility to multiple groups

Cfengine has no meta-access control mechanism which can decide who maywrite policy rules. To create such a mechanism, there would have to be a mon-itor which could identify users, and an authority mechanism that would disal-low certain users to write rules of certain types about certain objects on certainhosts. Clearly it is possible to create such a system, but it would be both tech-nically difficult, very cumbersome to use and would add a whole new level ofcomplexity to policy and potential error to the configuration process.

To keep matters as simple as possible, cfengine avoids this and proposesa different approach. Promise theory allows us to model the security impli-cations of this (see fig. 5.9 and its bow-tie structure). A simple method ofdelegating is the following.

1. Delegate responsibility for different issues to admin teams 1,2,3, etc.

2. Make each of these teams responsible for version control of their ownconfiguration rules.

3. Make an intermediate agent responsible for collating and vetting the rules,checking for irregularities and conflicts. This agent must promise to dis-allow rules by one team that are the responsibility of another team. Theagent could be a layer of software, but a cheaper and more manageablesolution is the make this another group of one or more humans.

60

5.9. AVAILABILITY AND CAPACITY MANAGEMENT

4. Make the resulting collated configuration version controlled. Publish ap-proved promises for all hosts to download from a trusted source.

A review procedure for policy promises is a good solution if you want todelegate responsibility for different parts of a policy to different sources. Hu-man judgement is irreplaceable, and tools can be added to make conflicts easierto detect.

Promise theory underlines that, if a host of computing device accepts policyfrom any source, then it is alone and entirely responsible for this decision. Theultimate responsibility for the published version policy is the vetting agent.This creates a shallow hierarchy, but there is no reason why this formal bodycould not be comprised of representatives from the multiple teams.

5.9 Availability and Capacity Management

Cfengine records all manner of information about the behaviour of computersduring its efforts to keep promises. These data offer the potential of mining forbuilding up a picture of the behaviour of an entire datacentre or organization,perhaps even multiple domains.

Cfengine’s environment daemon further collects patterns of environmentalinfluence of hosts in a resource non-intensive manner. These data contain muchinformation to enable capacity planning.

We should add a warning however. Capacity planning requires a consid-erable amount of data and analysis, as well as a sound and critical judgementof the data. Resource and performance management are such complex issuesthat no simple recipe or checklist can replace the judgement of an experiencedengineer. However, cfengine can supply data to such an engineer.

1. Performance measurements (cfshow -p) allow the average throughputof a server in terms of time to completion of service. If service times aretoo long, this is an indication (but not proof) that hardware should beupgraded.

2. Activity levels are graphed per service. These indicate the level of traf-fic coming into the different servers. Evidence of a ceiling limit on thethroughput (clipping in the time-series) can show insufficient through-put.

3. Distribution graphs of fluctuations about the mean can also show evi-dence of ceiling limits. Asymmetric distributions show when the major-

61


ity of service requests tend to bunch at a high level (probable stress onserver) or at a low level (over dimensioned server).

The level of technical understanding to make sound judgements based onthese data goes somewhat beyond the scope of this document. This motivatesus to create better tools for cfengine that can make these analyses more acces-sible to users. However, this must be deferred for another occasion.

62


Figure 5.5: Change in an ITIL world begins with a request for change (RFC).ITIL then says we should weight the consequences by looking at the urgencyand the potential impact of the change. Urgent changes can be dealt with by aspecial process – all others follow a standard model.

63


Figure 5.6: Once a change has been authorized, we can consider how to use atool like cfengine to implement it.

Figure 5.7: To code a change in cfengine, thinking of the fixed-point (black hole)model of change, we have to decide what promises our system will try to keeponce a change has been made.

64


Figure 5.8: Roll forward, not rollback. Move from one fixed point to anotherand back, if you like.

65


Figure 5.9: Delegation of responsibility requires an agent to vet access. Thisrequires some agent. It could be automatic or manual. Manual verification ofchanges is perhaps the simplest approach to manage.

66

Chapter 6

Summary

We have described the basics of cfengine and ITIL and shown a number ofareas where the two can be integrated.

• Cfengine users can benefit from the disciplines that ITIL brings.

• ITIL can benefit from the predictability that cfengine brings.

6.1 How we wrote this document

So, if ITIL is so great, did we use it to manage the process of writing this docu-ment? Authoring a document and authoring a policy have much in common,so let us spend a moment to examine the process of checks and balances thatwe have used to produce this text.

The answer to our question is both yes and no, and while this might soundrather unhelpful, we suggest that it is in fact a significant answer; indeed it isthe right answer in response to any question about best practices because suchrecipes must always be applied to a specific context.

There are sensible and ridiculous ways to implement a set of recommen-dations. ITIL users should expect to adapt its generalized ideas to each setof special circumstances. To do this here, we have used the parts of ITIL thatmake particular sense for authoring, and we have also used cfengine’s modelof promises or voluntary cooperation to understand how to implement them.

For example, ITIL suggests forming committees for discussing and decid-ing change. A committee is a cumbersome device when the total number ofpeople involved in the entire process is two. Nevertheless, the role of the com-

67

6.1. HOW WE WROTE THIS DOCUMENT

mittee is relevant (i.e. the promises it makes to bring the process to comple-tion), and this is where promise theory helps us to make sense of the “dumbrules”. We have multiple opinions and multiple pairs of eyes for quality con-trol as well as for inspiration.

6.1.1 ITIL concepts for authoring

Several parts of ITIL are quite relevant to authoring.

• Service management. A document provides an information service to itsclients (the readers). It promises to be accurate to within reasonable lim-its.

• Release management. Each version of our document can be considered arelease which undergoes a continual improvement cycle, constantly be-ing evaluated and changed in accordance with events and incidents thatoccur.

• Incidents. An incident is something that impacts on the service. An in-cident could be the discovery of an error in the text. It could be a dis-agreement between the authors or a misunderstanding on the part of thereader. There have been many incidental changes based on discussionsin our teams.

• Impact. The impact of the incident is the potential damage caused by theincident, or the usefulness of the discovery. Incidents are not necessarilynegative events. They can be events which point out improvements.

• Request for change. One of the authors asks to make changes to the text.

• Change management. Each identified change can be evaluated for its po-tential impact (benefit or confusion). If there are many changes to bemade, priority can be assigned to them. When should the changes beimplemented?

6.1.2 Promise concepts - voluntary cooperation

What does promise theory say about collaborative authoring?First of all, it begins by saying that each individual in the process of au-

thoring has independent knowledge and should be represented as a separateagent. It tells us that promises to cooperate will be needed to integrate theinformation.

68

6.1. HOW WE WROTE THIS DOCUMENT

However, more than that, promises tell us that each section of the text is an“agent” which can change or behave independently. In other words, we canmanage the parts independently, but again we need to promise to coordinatethose parts. So promise theory asks us first to identify the agents (the topics inthe document) that will be interacting and then find out what promises theyneed to make to carry out their function.

Because of the individual nature of the parts, we can associate an individualauthor to each. To bring them together we need a further agent or individualto collate independent ideas and policy sources into a single coherent whole.Thus promises shows us a basic “bow-tie” structure for integrating and corre-lating independent sources and then making the results available to indepen-dent users (see fig. 5.9). This is not the only solution to the problem of vettingthat promises predicts, but it is the simplest one. Also it is the approach thatITIL approves – making a someone responsible for the job.

We emphasize that promise theory does not tell us the specifics of howto implement solutions, it only tells us what elements are needed and howthey should interact. So we might implement agents as people, as differentcomputers, or as different user accounts within the same computer. As long asthe elements can keep the necessary promises, it does not matter.

6.1.3 Best of both worlds

So how did we write our document? In fact we did not use a very strict ITIL-like change management process when writing the first versions of our docu-ment. Such a process could have strangled our work in the creative stage anddoubled the time it took to write. Rather, we worked in an ad hoc way by volun-tary cooperation. Each of us promised to write about certain topics and workon the text independently. We worked as autonomous agents, and we usedSubversion (a version control and sharing system) to keep the working docu-ment. Subversion is itself a third agent which promises to accept changes oneat a time from either of the two authors and then make these changes availableagain to both authors. This agent performs no vetting or control other thanordering the changes.

The authors have to promise to one another to resolve any conflicts or dis-agreements, but promises do not suggest how this might take place. (ITIL, onthe other hand, does offer suggestions for this resolution process).

ITIL seems to work best once a service is up and running, or once a basicversion of a document exists. It does not say so much about the creative act,

69

6.2. ROAD-MAP FOR ADOPTION

except to think of it as a release.What ITIL is weak at is parallelization of effort. ITIL’s processes are serial-

ized processing models. In our first creative versions, we converged in parallelonto an approximate result, each working separately. This is very efficient butit can lead to duplication of work or inconsistency. Serialization is needed toresolve consistency issues precisely, but it leads to unnecessary waiting in somecases.

6.2 Road-map for adoption

Below we indicate a checklist of ITIL compliant steps for using cfengine in amachine life-cycle.

1. Set up cfagent running at scheduled interval X. This is the Service LevelAgreement.

2. Set up versioning of policy.

3. Set up delegation of authorship.

4. Run cfenvd for passive monitoring. Run cfagent for active monitoring.

Release:

1. Select installation medium e.g. DVD, net-boot with hooks to cfengine.

2. Start with essential promises, and formulate the configuration policy.

3. Use ITIL processes for deciding and refining configuration promises.

4. Evaluation and monitoring of promises using cfagent and cfenvd.

5. Use cfagent for monitor changes using cryptographic checksums.

6. Develop recovery plans. Use cfengine to automate backup of data andautomate the duplication of servers for load balancing and redundancy.

70

Appendix A

ITIL terminology

This section lists some of the many terms from ITIL, especially the ISO/IEC20000 version of the text, and offers some comments and translations into com-mon cfengine terminology.

A.1 Active Monitoring

Monitoring of a configuration item or IT service that uses automated regularchecks to discover the current status.

CFENGINE COMMENTS

Cfengine performs programmed checks of all of its promises eachtime cfagent is started. Cfagent is, in a sense, an active monitorfor a set of promises that are described in its configuration file.

A.2 Availability

The ability of a component or service to perform its required function.

Availability =Hours operational

Agreed service hours

71

A.3. ALERT

CFENGINE COMMENTS Availability or intermittency in cfenginerefers to the responsiveness of hosts in a network when remotelyconnecting to cfservd.

Intermittency =Successful attempts

Total Attempts

This is a

measurement that cfagent automatically makes.

A.3 Alert

A warning that a threshold has been reached, something has changed or afailure has occurred.

CFENGINE COMMENTS A cfengine alert fits this descriptionquite well. Most alerts are user-defined, but a few are side ef-fects of certain configuration rules.

A.4 Audit

A formal inspection and verification to check whether a standard or set ofguidelines is being followed.

CFENGINE COMMENTS

Cfengine’s notion of an audit is more like the notion from systemaccounting. However, the data generated by this extra logginginformation could be collected and used in a more detailed ex-amination of cfengine’s operations, suitable for use in a formalinspection (e.g. for compliance).

A.5 Baseline

A snapshot of the state of a service or an individual configuration item at apoint in time

72

A.6. BENCHMARK

CFENGINE COMMENTS

In cfengine parlance, we refer to this as an initial state or con-figuration. In principle a cfengine initial state does not have tobe a known-base line, since the changes we make will not gener-ally be relative to an existing configuration. Cfengine encouragesusers to define the final state (regardless of initial state).

A.6 Benchmark

The recorded state of something at a specific point in time.

CFENGINE COMMENTS

Cfengine does not use this term in any of its documentation,though our general understanding of a “benchmark” is that ofa standardized performance measurement under special condi-tions. Cfengine regularly records state and performance data ina variety of ways, for example when making file copies.

A.7 Capability

The ability of someone or something to carry out an activity.

CFENGINE COMMENTS

Cfengine does not use this concept specifically. The notion of acapability is terminology used in role-based access control.

A.8 Change record

A record containing details of which configuration items are affected and howthey are affected by an authorized change.

CFENGINE COMMENTS

Cfengine’s default modus operandi is to not record changes madeto a system unless requested by the user. Changes can be writtenas log entries or audit entries by switching on reporting.

Consider a typical cfengine promise (to ensure that a destination file is acopy of a source). Three levels of change recording can be added in cfengine 2:

73

A.9. CHRONOLOGICAL ANALYSIS

copy:

/source/file dest=/destination/file

inform=true

syslog=true

audit=true

An “inform” promise means that cfagent promises to notify the changes to itsstandard output (which is usually sent by email or printed on a console out-put). A “syslog” promise implies that cfagent will log the message to the sys-tem log daemon. Both of the foregoing messages give only a simple messageof actual changes. An “audit” promise is a promise to record extensive detailsabout the process that cfagent undergoes in its checking of other promises.

A.9 Chronological Analysis

An analysis based on the timeline of recorded events (used to help identifypossible causes of problems).

CFENGINE COMMENTS

A timeline analysis could easily be carried out based on auditinformation, system logs and cfenvd behavioural records.

A.10 Configuration

A group of configuration items (CI) that work together to deliver an IT service.

CFENGINE COMMENTS

A configuration is the current state of resources on a system. Thisis, in principle, different from the state we would like to achieve,or what has been promised.

A.11 Configuration Item (CI)

A component of an infrastructure which is or will be under the control of con-figuration management.

74

A.12. CONFIGURATION MANAGEMENT DATABASE (CMDB)

CFENGINE COMMENTS

A configuration item is any object making a promise in cfengine.We often speak of the promise object, or “promiser”.

A.12 Configuration Management Database (CMDB)

Database containing all the relevant details of each configuration item and de-tails of the important relationships between them.

CFENGINE COMMENTS

Cfengine has no asset database except for its own list of promises.The only relationships is cares about are those which are explic-itly coded as promises. In the future, cfengine 3 is likely to ex-tend the notion of promises to allow more general records of theCMDB kind, but only to the extent that they can be verified au-tonomically.

A.13 Document

Information and its supporting medium.

CFENGINE COMMENTS

ITIL originally considered a document to be only a container forinformation. In version 3 it considers also the medium on whichthe data are recorded, i.e. both the file and the filesystem onwhich it resides.

A.14 Emergency Change

A change that must be introduced as soon as possible – for example to solve amajor incident or to implement a critical security patch.

CFENGINE COMMENTS

Cfengine has no specific concept for this.

A.15 Error

A design flaw or malfunction that causes a failure.

75

A.16. EVENT

CFENGINE COMMENTS

Cfengine often uses the term configuration error to mean a devia-tion of a configuration from its promised state. The ITIL meaningof the term would translated into “bug in the cfengine software”or “bug in the promised configuration”.

A.16 Event

A change of state that has significance for the management of a configurationitem or IT service.

CFENGINE COMMENTS

The same basic definition applies to cfengine also, but cfenginemakes all such events into classes, since its approach to observingthe environment is to measure and then classify it into approx-imate expected states. Cfengine class attributes (usually fromcfenvd) may be considered as event notifications as they change.

A.17 Exception

An event that is generated when a service or device is currently operating ab-normally.

CFENGINE COMMENTS

A state in which configuration policy is violated (could lead to awarning or an automated correction).

A.18 Failure

Loss of ability to operate to specification or to deliver the required output.

CFENGINE COMMENTS

ITIL’s idea of a failure is something that prevents a promise frombeing kept. Cfengine’s autonomy model means that it is unlikelyfor such a failure to occur, since promises are only allowed tobe made about resources for which we have all privileges. Occa-sionally, environmental issues might interfere and lead to failure.

76

A.19. INCIDENT

A.19 Incident

Any event that is not expected in normal operations and which might cause adegradation of service quality.

CFENGINE COMMENTS

Cfengine’s philosophy of convergence gives us only one optionfor interpreting this term, namely as a temporary deviation frompromised behaviour. A deviation must be temporary if cfengineis operating continually, since it will repair any problem on itsnext invocation round. Events which do not impact promisesmade by cfengine are of no interest to cfengine, since auton-omy means it cannot be responsible for anything beyond its ownpromises.

A.20 Monitoring

Repeated observation of a configuration item, IT service or process in order todetect events and ensure that the current status is known.

CFENGINE COMMENTS

Cfengine incorporates a number of different kinds of monitoring,including monitoring of kept configuration-promises and pas-sive monitoring of behaviour.

A.21 Passive Monitoring

Monitoring of a configuration item or IT service that relies on an alert or noti-fication to discover the current status.

CFENGINE COMMENTS

Cfenvd is cfengine’s passive monitoring component. It observessystem related behaviour and learns about it. It assumes thatthere is likely to be a weekly periodicity in the data in order tobest handle its statistical inference.

A.22 Policy

Formally documented management expectations and intentions. Policies areused to direct decisions, and to ensure consistent and appropriate develop-

77

A.23. PROACTIVE MONITORING

ment and implementation of processes, standards, roles, activities, IT infras-tructures, etc.

CFENGINE COMMENTS

Cfengine’s configuration policy is an automatable set of promisesabout the static and runtime state of a computer. Roles are iden-tified by the kinds of behaviour exhibited by resources in a net-work. We say that a number of resources (hosts or smaller config-uration objects) play a specific promised role if they make identi-cal promises. Any resource can play a number of roles. Decisionsin cfengine are made entirely on the basis of the result of moni-toring a host environment.

A.23 Proactive Monitoring

Monitoring that looks for patterns of events to predict possible future failures.

CFENGINE COMMENTS

All cfengine monitoring is pro-active in the sense that it can leadto automated follow-up actions.

A.24 Problem

Unknown underlying cause of one or more incidents.

CFENGINE COMMENTS

A repeated deviation from policy that suggests a change of pol-icy or specific counter-measures. A promise needs to be recon-sidered or new promises are required.

A.25 Promise

ITIL does not define this term, although promises are deployed in various ways– for instance in terms of cooperation, communication interfaces within or be-tween processes or contractual relationships as defined by Service Level Agree-ments, Operational Level Agreements and Underpinning Contracts.

78

A.26. REACTIVE MONITORING

CFENGINE COMMENTS

A promise in cfengine is a single rule in the cfengine language.The promiser is the resource whose properties are described, andthe promisee is implicitly the cfengine monitor.

A.26 Reactive Monitoring

Monitoring that takes action in response to an event – for example submittinga batch job when the previous job completes, or logging an incident when anerror occurs.

CFENGINE COMMENTS

The concept of reactive monitoring is unclear because the dura-tion of an event and the speed of a response are undefined. In asense, all cfengine monitoring is potentially reactive. It is possi-ble to attach actions which keep promises to any observable con-dition discernable by cfengine’s monitor. Cfengine is not usuallyconsidered event driven however, since it does not react “as soonas possible” but at programmed intervals.

A.27 Record

Information in readable form that is maintained by the service provider aboutoperations.

CFENGINE COMMENTS

A log entry or database item.

A.28 Recovery

Returning a Configuration Item or an IT service to a working state. Recoveringof an IT service often includes recovering data to a known consistent state.

79

A.29. REMEDIATION

CFENGINE COMMENTS

All cfengine promises refer to the state of a system that is desired.The promises are automatically enforced, hence cfengine recov-ers a system (in principle) on every invocation. Cfengine alwaysreturns to a known state, due to the property of “convergence”.There is no distinction between the concepts of repair, recoveryor remediation.

A.29 Remediation

Recovery to a known state after a failed change or release.

CFENGINE COMMENTS

All cfengine promises refer to the state of a system that is desired.The promises are automatically enforced, hence cfengine recov-ers a system (in principle) on every invocation. Cfengine alwaysreturns to a known state, due to the property of “convergence”.There is no distinction between the concepts of repair, recoveryor remediation.However, this concept is like the notion of “rollback” which ofteninvolves a more significant restoration of a system from backup.This is discussed later.

A.30 Repair

The replacement or correction of a failed configuration item.

CFENGINE COMMENTS

All cfengine promises refer to the state of a system that is desired.The promises are automatically enforced, hence cfengine recov-ers a system (in principle) on every invocation. Cfengine alwaysreturns to a known state, due to the property of “convergence”.There is no distinction between the concepts of repair, recoveryor remediation.

A.31 Release

A collection of new or changed configuration items that are introduced to-gether.

80

A.32. REQUEST FOR CHANGE

CFENGINE COMMENTS

An instantiation of the entire cfengine system under a specificversion of a policy, i.e. a specific set of promises.

A.32 Request for Change

A form to be completed requesting the need for change. This is to be followedup.

CFENGINE COMMENTS

This has no counterpart in cfengine. It is part of human commu-nication which coordinates autonomous machines. Clearly au-tonomous computers do not listen to change requests from othercomputers, but when machines cooperate in clusters or groupsthey take suggestions from the collaborative process. An RFC inan ITIL sense is part of an organizational process that goes be-yond cfengine’s level of jurisdiction. This is an example of whatITIL adds to the autonomous cfengine model.

Make a note! 3 (Abandon autonomy?) Why not simply abandon autonomy ofmachines if this seems to interfere with the need for organizational change? Thereare good reasons why autonomy is the correct model for resources. Autonomy reducesthe risk to a resource of attack, mistake and error propagation.

ITIL’s processes exist precisely to minimize the risk of negative impact of change, sothe goals are entirely compatible. When an organization discusses a change it examinesinformation from possible several autonomous systems and discusses how they willchange their pattern of collaboration. There is no point in this process at which it isnecessary for one of the systems to give up its autonomy.

A.33 Resilience

The ability of a configuration item or IT service to resist failure or to recoverquickly following a failure.

CFENGINE COMMENTS

Cfengine’s purpose is to make a system resilient to unpredictablechange.

81

A.34. RESTORATION

A.34 Restoration

Actions taken to return an IT service to the users after repair and recovery froman incident.

CFENGINE COMMENTS

All cfengine promises refer to the state of a system that is desired.The promises are automatically enforced, hence cfengine recov-ers a system (in principle) on every invocation. Cfengine alwaysreturns to a known state, due to the property of “convergence”.There is no distinction between the concepts of repair, recoveryor remediation.However, this concept seems to suggest a more catastrophic fail-ure which often involves a more significant restoration of a sys-tem from backup. This is discussed later.

A.35 Role

A set of responsibilities, activities and authorities granted to a person or a team.Roles are defined in processes.

CFENGINE COMMENTS

A role in cfengine is a class of agents that make the same kind ofpromise. The type of role played by the class is determined bythe nature of the promise they make. e.g. a promise to run a webserver would naturally lead to the role “web server”.

A.36 Service desk

Interface between users and service provider.

CFENGINE COMMENTS

A help desk. This is not formally part of cfengine’s tool set.

A.37 Service Level Agreement

A written agreement between the service provider that documents agreed ser-vices, levels and penalties for non-compliance.

82

A.38. SERVICE MANAGEMENT

CFENGINE COMMENTS

An agreement assumes a set of promises that propose behaviourand an acceptance of those promises by the client. If we assumethat the users are satisfied with out policies, then an SLA can beinterpreted as a combination of a configuration policy (configu-ration service promises), and the cfengine execution schedule.

A.38 Service Management

The management of services.

CFENGINE COMMENTS

Same.

A.39 Warning

An event that is generated when a service or device is approaching its thresh-old.

CFENGINE COMMENTS

A message generated in place of a correction to system statewhen a deviation from policy is detected. Note that cfengineis not based on fixed thresholds. All “thresholds” for action orwarning are defined as a matter of policy.

83

Bibliography

[1] T. Peters and R.H. Waterman Jr. In Search of Excellence. Profile Books,1982,2003.

[2] J. Collins and J.I. Porras. Built to Last. Collins, 1994,2002.

[3] D. Aredo and M. Burgess. On the consistency of distributed knowledge.In Proceedings of MACE 2007, volume 6 of Multicon Lecture Notes. MulticonVerlag, 2007.

[4] Office of Government Commerce, editor. Best Practice for Service Support.ITIL: The Key to Managing IT Services. The Stationary Office, London,2000.

[5] Office of Government Commerce, editor. Best Practice for Service Delivery.ITIL: The Key to Managing IT Services. The Stationary Office, London,2000.

84

, ITIL

Documents

Transcript of , ITIL