LISA '01 cover · cfengine. It can be viewed as a mix of dynamic and static schedules where the...

21
THE MAGAZINE OF USENIX & SAGE February 2002 • Volume 27 • Number 1 The Advanced Computing Systems Association & The System Administrators Guild & inside: CONFERENCE REPORTS LISA ‘01: The 15th Systems Administration Conference

Transcript of LISA '01 cover · cfengine. It can be viewed as a mix of dynamic and static schedules where the...

Page 1: LISA '01 cover · cfengine. It can be viewed as a mix of dynamic and static schedules where the dynamic tasks are triggered by actions and the static tasks have precedence. One of

THE MAGAZINE OF USENIX & SAGEFebruary 2002 • Volume 27 • Number 1

The Advanced Computing Systems Association &

The System Administrators Guild

&

inside:CONFERENCE REPORTS

LISA ‘01: The 15th Systems

Administration Conference

Page 2: LISA '01 cover · cfengine. It can be viewed as a mix of dynamic and static schedules where the dynamic tasks are triggered by actions and the static tasks have precedence. One of

75February 2002 ;login: LISA 2001 ●

conference reportsThis issue’s reports focus on on the

15th System Administration Conference

(LISA 2001) held in San Diego, Califor-

nia, December 2-7, 2001.

OUR THANKS TO THE SUMMARIZERS:

Mark Burgess

Marguerite Curtis

Yolanda Flores-Salgado

Liliana Hernandez

Doug Hughes

Steven Levine

Mark Logan

Armando Rojas-Morin

Joel Sadler

Michael Sconzo

Josh Simon, Coordinator

Tim Smith

Crystal Stockton

Jeff Tyler

Jin-ping Wan

Jason Wertz

Garry Zecheiss

75

each a computational unit. Bacterialcells communicate and cooperate.

If a cell is a supercomputer and a nodein a network, who administers the cellsand the network? Who is nature’s sysad-min? Not DNA, administering from thetop down, as we have been taught. Cellscan absorb information and change pro-teins by means of viruses and retro-viruses. The viruses are the methods ofcommunication between the nodes in anetwork. They seem to be necessary,which is why they are so tough to get ridof.

In a biological system, everything is akludge. Change, however, is a necessity.Mr. Bear spoke of Barbara McClintock’swork with jumping genes, DNA thatchanges systematically. A genome is likean ecosystem, and a gene must cooper-ate with hundreds and thousands ofother genes; we know this from the evi-dence of embryological cells not cooper-ating, and the resulting problems.

What we are looking at is evidence of“social biology” (social biology – notsociobiology). Biology is social from thegenome on up. We are now finding thevery language to describe how systemswork. And who speaks this languagealready? System administrators.

The old paradigms of biology – ran-domness, DNA writing to RNA in aprocess that never reverses – are “deadwrong.” Yet the paradigms are still being

15th Systems AdministrationConference (LISA 2001)SAN DIEGO, CALIFORNIA

DECEMBER 2-7, 2001KEYNOTE ADDRESS

SLIME VERSUS SILICON

Greg Bear

Summarized by Steven Levine

The keynote address at LISA 2001 wasgiven by much-awarded science fictionauthor Greg Bear. Mr. Bear spoke aboutnew paradigms in our understanding ofbiological systems that encourage us toview cells, particularly bacterial cells, asnodes in a network, independent super-computers that cooperate, communicateand use transfer media to alter them-selves and each other. Who is particu-larly suited to understand and talk aboutthis paradigm? System administrators.

Well, system administrators and sciencefiction readers, who are also characteris-tically open to listening to visionaryworldviews underscored with conspiracytheories. Mr. Bear noted right up frontthat the LISA crowd was indistinguish-able from the crowd at the largest sci-ence fiction conventions. He laterextended the comparison by noting thatscience fiction fans are like children inthat they are “eternally curious and notinterested in fashion.” Sci-fi fans, likesysadmins, are below the radar level ofmost of society. Mr. Bear’s understand-ing of system administrators, and partic-ularly the self-image of systemadministrators, was stunning.

The real LISA, says Mr. Bear, exists in theacreage surrounding the Town andCountry Resort Hotel. LISA is the Later-ally Integrated Stochastic Anticipator,the bacterial computer network in thesoil system. A bacterial network is aslime machine, a bacterial supercom-puter: a cell has three billion base pairs,

Greg Bear

Page 3: LISA '01 cover · cfengine. It can be viewed as a mix of dynamic and static schedules where the dynamic tasks are triggered by actions and the static tasks have precedence. One of

taught. We need, instead, to look at theway we put systems together to gainsome understanding of the problems ofbiology.

So, Mr. Bear says, go forth and studybiology. You will learn how to adminis-ter systems; slime has been doing it forbillions of years. All biological systemsare networks of users, and users all havedifferent priorities. We must learn to beopen, to think like children, in order todeal with networks of users.

REFEREED PAPERS

STIRRING THE MATRIX: ORGANIZA-

TIONAL SYSTEM ADMINISTRATION

Summarized by Tim Smith

DEFINING THE ROLE OF SERVICE MANAGER:SANITY THROUGH ORGANIZATIONAL

EVOLUTION

Mark Roth, University of Illinois atUrbana-Champaign

The presentation began with Mr. Rothdefining a service as a collection of toolsthat allow users to do their jobs. He thenreviewed the evolution of services overthe past decade. Services in the earlynineties were homegrown tools whereno distinction was really made betweenthe system and the services provided. Inthe late nineties, client-server applica-tions, where there was some distinctionbetween the system and the service pro-vided, took the place of in-house soft-ware. Services today are taking the formof black boxes where the service soft-ware is distinct from the system it runson, and users are not aware of the typeof system used to run the service. Thethesis of the paper is that in this newenvironment the role of service managershould be entirely separate from that ofsystem administrator.

As presented, the role of service managerincludes several components: initialplanning, production deployment, andongoing maintenance of a service.Thereís too much work required by the

76 Vol. 27, No. 1 ;login:

components to be handled by either sys-tem administrators or developers. Sys-tem administrators are too busy to beginwith, and their core competency is sys-tem management not maintaining serv-ices. In addition, system administratorsneed to achieve economies of scale intheir work, and this is not possible inservice management. Service manage-ment should not be handled by develop-ers due to their incompatible timerequirements and core competency inprogramming.

In Mr. Roth’s approach the service man-ager focuses on the users of the serviceand is responsible for ensuring that theservices needed by users are available.This means giving requirements todevelopers for in-house software anddelivering system requirements to sys-tem administrators so the hardware ser-vice will be available when needed. Theadvantages of this approach as seen atUIUC include improved communica-tion with the service manager as theonly communication channel betweensystem administrators, developers, andusers; increased staff retention; easierbudgeting; and achievement of someeconomies of scale. Mr. Roth pointedout that the approach is not fully inplace at UIUC but that its advantageswere already being seen.

Additional information can be found onMr. Roth’s Web page at http://www.uiuc.edu/ph/www/roth/.

NEW TECHNOLOGIES FOR SMALL AND

MEDIUM BUSINESSES (SMB)

Degan Diklic, Venkatesh Velayutham,Steve Welch, and Roger Williams, IBMAlmaden Research Center

Mr. Diklic’s presentation addressed theremote outsourcing of services for mul-tiple branch offices and small businesses.In this presentation a small business isdefined as having fewer than 500machines, and a medium-sized businessis a business with fewer than 5,000

machines. Servicing branch offices andsmall businesses is not an attractive ven-ture for service companies trying tomake a profit, because the clients are onthe other side of a firewall, dedicatedlines to bypass the firewall are expensive,a full-time administrator for a small siteis also expensive, and there is no genericservice infrastructure in place acrosssites. One idea is to remotely managesmall sites using VPN and remote man-agement tools. Current solutions thatimplement this idea, such as OpenViewor Cobalt Blue, are expensive.

Mr. Diklic’s solution avoids the expenseof the available solutions while stillallowing sites to be serviced effectively.The solution places a communicationserver outside the firewalls of the remotesite and the service provider. These com-munication servers are owned by atrusted company, IBM in this case, andare used to make a connection betweenusher servers on the local LANs. Theusher server is used to resolve networklocations of the remote machines sothey can be administered as if they werepart of the service company’s network.

This architecture has been used in sev-eral projects in Mr. Diklic’s researchgroup. The first project was a diskexpansion project that allowed addi-tional disks to be used as an extension ofan existing disk in the remote site. Thisallows remote sites to share disk storageand makes remote data storage possible.The second project is a backup utility forremote sites. The backup of the remotemachines is performed over the networkat night when the bandwidth is notbeing utilized. Data restoration of userdata is performed by an application CDthat connects to the remote backupfacilities and allows the user to access thedata. Authentication is also handled bythe CD since it contains the customer’susername and password in a secureform.

Page 4: LISA '01 cover · cfengine. It can be viewed as a mix of dynamic and static schedules where the dynamic tasks are triggered by actions and the static tasks have precedence. One of

TECHNOLOGIES INDISTINGUISHABLE FROM

MAGIC: ANALYTICAL SYSTEM

ADMINISTRATION

Summarized by Marguerite Curtis

A PROBABILISTIC APPROACH TO ESTIMATING

COMPUTER SYSTEM RELIABILITY

Robert Apthorpe, Excite@Home, Inc.

Apthorpe began by expounding on howthe tutorial on probabilistic risk assess-ment is a technique for finding vulnera-bilities. He then addressed the reasonswhy he wrote it, talking about the prob-lems that system administrators face.For example, they generally have littlebackground on systems engineering andtherefore have little context for under-standing formal risk assessment, or theydon’t know how to detect problems, orthey simply make a bad decision, likeputting a primary and secondary serveron the same switch. After acknowledgingthe problems, he spoke on why riskassessment is so relevant. His list con-tained about five points. For example,analysis is cheaper than firefighting anda good design defends against knownproblems.

In the second half of his talk Robert gaveus the overview of his method, whichconsists of eight steps: define your prob-lems; define your system; build a logicmodel of system failure; decompose sys-tem information of most basic actionsand events; find the minimal sequenceof events that lead to failure; estimateprobability of event sequence fromobserved or estimated data; generatemeasures of component importance;and sanity check the model and theresults. He then showed us an event treeanalysis and a sample event tree. Thenext topic was how to use the results andwhat the weaknesses of the system are.He addressed these points and why theyexist. Concluding, he spoke of his futurehopes and plans, other possible researchtopics, and other applications, such assecurity, capacity analysis, or insuranceand risk management.This paper wonthe Best Theory Paper award this year.

77February 2002 ;login: LISA 2001 ●

C

ON

FERE

NC

ERE

PORT

SSCHEDULING PARTIALLY ORDERED EVENTS IN

A RANDOMIZED FRAMEWORK: EMPIRICAL

RESULTS AND IMPLICATIONS FOR AUTOMATIC

CONFIGURATION MANAGEMENT

Frode Sandnes, Oslo University College

Sandnes began by explaining his idea:the schedule would automatically main-tain a system state to benefit all usersand would be achieved by tools such ascfengine. It can be viewed as a mix ofdynamic and static schedules where thedynamic tasks are triggered by actionsand the static tasks have precedence.One of his main points is that the usercan allocate any task to a particularschedule. He moved on to discuss ran-domized strategies and algorithms, ascompared to the scheduled algorithm.His objectives consisted of finding outhow the randomized schedule affects theefficiency, ability to intervene, and abil-ity to identify the config model. He com-pared a deterministic managementframework with a random one to deter-mine efficiency. In the second half of histalk, he addressed malicious interven-tion and how it relates to his work andrandomization. If the abuser wants touncover the model, assuming the abusercan observe, randomized scheduling canmake it more difficult. His idea is thatthe abuser will be unable to observe asequence of events if there is only a ran-dom sequence to look at. Sandnes con-cluded that randomized scheduling leadto consistent performance, reduces thepredictability of management abilities,and hides strategies, as well as notingthat the framework is easy to imple-ment.

THE MAELSTROM: NETWORK SERVICE

DEBUGGING VIA “INEFFECTIVE PROCEDURES”

Alva Couch and Noah Daniels, TuftsUniversity

Couch began by stating his target prob-lem, which is to automate network trou-bleshooting. His dream was to create aquicker response to network response.He then began showing how it all isformed, starting with pre-declaring

precedences, which must be done everytime a script is added. This is a pain, sohe moved on to discovering orderbetween scripts without declaring,claiming that they will fail robustlywhen called at the wrong time, tell youwhen they fail, and won’t undo eachothers actions. Moving further into dis-covering order in ineffective procedures,he talked about how it is easier to checkwhether a condition is present than toexecute it. Trading extra executions forlack of precedence tables is cheaper andless work. The efficiency depends on theinitial ordering.In the second half of histalk, Alva explained what is necessary forthe commands and what he learnedfrom it all. Each command requiresawareness as to whether or not it failed(which is the easy part), must be homo-geneous (which is the hard part), andmust be convergent. There are no pre-conditions to engineering maelstroms,and they are safe to run in any sequence.He learned that causality is not a mythand cannot determine what will happen.You can determine what repaired a spe-cific problem, not what caused it. It isnot causal, but operational. He con-cluded with tasks he is working on andwill be working on, such as a trou-bleshooting script.

MONTE LISA OVERDRIVE: EMPIRICAL

SYSTEM ADMINISTRATION

Summarized by Joel Sadler

PERFORMANCE EVALUATION OF LINUX VIR-TUAL SERVER

Patrick O’Rourke and Mike Keefe, Mission Critical Linux, Inc.

O’Rourke presented a performancecomparison showing the relative meritsof Linux Virtual Server (LVS) over hard-ware-based load balancing alternatives.Patrick began by explaining what LVSwas and how it could be used to improvea Web site’s performance. Their testingshowed that not only is LVS quite capa-ble of competing with hardware LBdevices, it can be dramatically lessexpensive per request/second.

Page 5: LISA '01 cover · cfengine. It can be viewed as a mix of dynamic and static schedules where the dynamic tasks are triggered by actions and the static tasks have precedence. One of

MEASURING REAL-WORLD DATA

AVAILABILITY

Larry Lancaster and Alan Rowe, Net-work Appliance, Inc.

If there is a holy grail in sysadmin today,it’s the much-coveted five 9s (99.999%)of reliability. This somewhat eye-open-ing presentation showed a view of realityspecifically with regard to NetApp filers.Using data gathered from customers viaONTap’s Autosupport feature, Lancastershowed how they had categorized thefailure data and then laid out the con-clusions the data had shown. Quite sur-prisingly, their data showed that fewer“Operator” type errors occurred thanpower failures, even with clustered sys-tems.

SIMULATION OF USER-DRIVEN COMPUTER

BEHAVIOR

Harek Haugerud and Sigmund Straum-snes, Oslo University College

Haugerud talked about the challengesinherent in building a model to simulateuser behavior on a given multi-usercomputer system. His presentationshowed their design principles andexplained some of the initial goals theyhad in setting out. He then presented afairly technical breakdown of their test-ing methodology. In doing so, Harekshowed how they had tested the simula-tion with a known user-action data setobtained from a third party. Theirresults were impressive; while they admitthat this particular simulation is in itsinfancy, its usefulness is easily visible.

SEEING HOW THE LAN LIES: NET-

WORK MONITORING

Summarized by Liliana Hernandez

SPECIFIC SIMPLE NETWORK MANAGEMENT

TOOLS

Jürgen Schönwälder, Technical University of Braunschweig

Schönwälder described the design andimplementation of an SNMP manage-ment tool called scli, which provides an

78 Vol. 27, No. 1 ;login:

efficient-to-use command-line interfaceto display, modify, and monitor dataretrieved from SNMP agents.The SNMPmanagement tools available today fallinto one of the following five categories.

■ Generic low-level SNMP tools■ Generic low-level SNMP APIs■ Generic MIB browsers■ Generic monitoring tools■ Generic management platforms

But the author still often feels uncom-fortable when trying to use them; forexample, the generic tools often do notunderstand the relationships betweenMIB objects.The software designaddresses five key requirements: extensi-bility, robustness, maintainability, effi-ciency, and portability.The package usesthe glib library to archive portability andto reuse generic data structures such aslist and dynamic strings. The SNMPengine gsnmp has been derived from thegxsnmp package and was subsequentlymodified to fix bugs and to improve sta-bility. The SNMP engine itself uses glib.The interpreter core and some com-mand implementations also use thelibxml2 library to create and manipulateXML documents.The SNMP enginedoes not yet support SNMPv3 security.The code generator can be improved inmany ways. The biggest limitation rightnow is the restriction that stubs can onlyoperate on table rows or groups ofscalars.

GOSSIPS – SYSTEM AND SERVICE MONITOR

Victor Götsch, Albert Wuersch, TobiasOetiker, Swiss Federal Institute of Technology

Gossips is a modular client-server-basedsystem monitor. Gossips not onlyreports problems but suggests solutionsto the problems by consulting a knowl-edge base. The monitor software is writ-ten in object-oriented Perl.The goal inthis project was to address some of theproblems found with existing solutionslike SNMP, Big Brother, Swatch, Spong,and PIKT.

■ Big Brother is good in design, scala-bility, and messaging. It is okay inconfiguration and is extensible.

■ Swatch is very extensible. It is okayin configuration, design, scalability,and messaging, but it is not modular.

■ Spong is good in design, scalability,and messaging. It is okay in config-uration, extensibility, and modular-ity.

■ PIKT is good in configuration,design, scalability, extensibility, andmessaging, but it is missing modu-larity.

■ gossips is good in configuration,design, scalability, extensibility,modularity, and messaging.

The distributed architecture of gossipsbuilds a scalable monitoring system.Through its flexible and central configu-ration environment, together with itscommand-line module, gossips is easilymaintainable.The object-oriented designof gossips builds a flexible and well-defined framework for developing newmonitoring tasks. The concept of sepa-rating data acquisition and data analysismakes defined monitoring tasks reusableand provides the possibility to buildcombined tests. The knowledge baseallows one to archive solutions to knownproblems in one place and to integratethe knowledge of the system manager.By including cfengine, gossips could beextended into an automated repair tool.

THE CORALREEF SOFTWARE SUITE AS A

TOOL FOR SYSTEM AND NETWORK

ADMINISTRATORS

David Moore, Ken Keys, Ryan Koga,Edouard Lagache, kc claffy, CAIDA

CoralReef is a package of device drivers,libraries, classes, and applications andprovides a suite of tools to aid networkadministrators in monitoring and diag-nosing changes in network behavior.CoralReef offers a unified platform to awide range of capture devices and a col-lection of tools that can be applied at

Page 6: LISA '01 cover · cfengine. It can be viewed as a mix of dynamic and static schedules where the dynamic tasks are triggered by actions and the static tasks have precedence. One of

multiple network levels. Its componentsprovide measurements on a wide rangeof real-world network traffic flow appli-cations, including validation and moni-toring of hardware performance forsaturation and diagnosis of network-flow constraints. CoralReef can be usedto produce stand-alone results or datafor analysis by other programs. Coral-Reef reporting applications can outputin text formats that can be easily manip-ulated with common UNIX data-reduc-tion utilities, providing enormous flex-ibility for customization in an opera-tional setting. CoralReef provides a bal-anced collection of features for networkadministrators seeking to monitor theirnetwork and diagnose trouble spots. Bycovering the range from raw packet cap-ture to real-time HTML report genera-tion, CoralReef provides a viable toolkitfor a wide variety of network adminis-tration needs.

LEVEL 1 DIAGNOSTICS: SHORT

TOPICS ON HOST MANAGEMENT

Summarized by Jeff Tyler

GLOBAL IMPACT ANALYSIS OF DYNAMIC

LIBRARY DEPENDENCIES

Alva Couch and Yizhan Sun, Tufts University

SoWhat is a tool for analyzing and trac-ing library dependencies in a large dis-tributed environment. ldd can tell youwhat libraries any given program willload, but how do you determine thetotal set of programs in a large environ-ment that might require a given library?The simple answer is that you don’t, andthus one can never delete a library in acomplex environment without a signifi-cant chance that some program some-where in the environment will thenbreak. Therein lies the path to libraryrot. SoWhat attempts to address thisproblem by analyzing and cataloging alllibrary dependencies in just such a largeenvironment. SoWhat currently runs onSolaris 7/8, with a Linux version prom-

79February 2002 ;login: LISA 2001 ●

C

ON

FERE

NC

ERE

PORT

Sised. It’s written in Perl and requiresMySQL. It is freely available athttp://www.eecs.tufts.edu/~couch/sowhat.

DERIVING TOOLS TO ADMINISTER DOMAIN

AND TYPE ENFORCEMENT

Phil Kearns and Serge Hallyn, Collegeof William and Mary

In this context, Domain and TypeEnforcement (DTE) means a mecha-nism to provide fine-grained mandatoryaccess control beyond the level providedby a conventional UNIX kernel. DTEsystems normally use a rather denselypopulated text file as a control and pol-icy establishment tool, and simple typosin these files can have a catastrophiceffect on the surety of the system. Philand Serge have addressed this issue byproducing two tools to aid in adminis-tration of DTE configuration files and toprovide a graphic view of system objectsthat any controlled program mightinteract with. The tools are calledDTEedit and DTEview and are availableat http://www.cs.wm.edu/~hallyn/dte.

SOLARIS BARE-METAL RECOVERY FROM A

SPECIALIZED CD AND YOUR ENTERPRISE

BACKUP SYSTEM

Lee Amatangelo, Collective Technolo-gies, and Curtis Preston, The StorageGroup

Building on the success of their popularCART tool (first presented at LISA2000), Lee and Curtis have constructedBART, the Solaris Bare-Metal RecoveryTool. CART was a system-specific tool,but BART is a networked version thatcan deal with multiple machines usingan enterprise backup system and a singleCD. It currently works on Solaris onlydue to Jumpstart dependencies and willoperate with both Legato and VeratisNetBackup, although there are someVeratis issues.

ACCESSING FILES ON UNMOUNTED

FILESYSTEMS

Willem A. (Vlakkies) Schreuder, University of Colorado

Now this is a very useful utility. It is usedfor recovering files from unmounteddisks and general bunged-up diskspelunking. If you’ve ever spent any timein fsdb you’ll appreciate what went intothe construction of this tool. It workslike cat and has both stand-alone (ruf)and callable library (libruf) versions; itcan automatically determine the loca-tion of alternate superblocks and per-form other useful disk tricks. It currentlyworks on *BSD, Linux, Sun OS/Solaris,and HP-UX. It is available under theBSD license at http://www.netperls.com/ruf.

TO YOUR SCATTERED PCS GO!

DISTRIBUTED CONFIGURATION

MANAGEMENT

Summarized by Tim Smith

AUTOMATING INFRASTRUCTURE COMPOSI-TION FOR INTERNET SERVICES

Todd Poynor, HP Labs

The automatically configured data cen-ter is an environment where the efficientredeployment of resources in the datacenter is required in order to meetchanging demand. In this environmentfederations of resources from autono-mous compute systems work together toprovide a service. Such an environmentdoes not exist today, but the frameworkpresented by Poynor is a result ofresearch and industry activity.

Poynor’s talk presented a framework forcomposing Internet services from com-ponent services. In this framework, sys-tem administrators issue instructions tothe computing resources on what serv-ices to deploy. The services that can bedeployed are grouped into contexts thatallow services to cooperate to achieve alarger goal and automatically discovernew members of the context. Informa-tion is also provided to the framework

Page 7: LISA '01 cover · cfengine. It can be viewed as a mix of dynamic and static schedules where the dynamic tasks are triggered by actions and the static tasks have precedence. One of

about deployment changes in the con-text that allow services to adjust andreconfigure relationships so the properservices are still provided.

Changes to the service deployment mustbe specified by an administrator orautomated process. The affectedresources are notified of the change,which allows them to start and stopcomponent services and possibly rebootmachines into new environments. Theservices add and drop relationshipsbased on the current environment. Oncethe instructions have been provided,Internet services are stopped and startedwithout administrator intervention. Mr.Poynor gave an example of what wouldhappen when adding a new machineinto a Web server farm.

The framework will require a protocolsuitable for all hardware and software.The current prototype used by Mr.Poynor’s group is an extension of theIETF Service Location Protocol. Theframework, implemented with the pro-tocol, extends UNIX system startupscripts or the Windows Services appletto allow the machine to be dynamicallyconfigured.

The PowerPoint presentation of the talkcan be found at http://www.hpl.hp.com/personal/Todd_Poynor/.

TEMPLATETREE II: THE POST-INSTALLATION

SETUP TOOL

Tobias Oetiker, Swiss Federal Instituteof Technology

TemplateTree II addresses the problemof adding new machines into an envi-ronment. Each new machine needs anoperating system and software packagesinstalled and any site-specific configura-tion changes. All of the modificationscan be made to a base operating systeminstall using cfengine.

Oetiker’s presentation covered howTemplateTree II can be used to generatethe cfengine.conf files necessary to applythe modifications to a base system and

80 Vol. 27, No. 1 ;login:

POD-style documentation of the com-ponents that make up the modifications.TemplateTree II uses tools to set up sub-systems including network configura-tion, the AFS client, and SSH configu-ration. Metadata are added to each sub-system description, which also includesthe component configuration files, soTemplateTree II knows what each sub-system does.

Configuration of a base system usingTemplateTree II involves specifyingwhich subsystems to apply to the system.The metadata of each subsystem areused to create the cfengine.conf file.Once cfengine and the generated config-uration file are installed on the base sys-tem, cfengine will finish installation andconfiguration of the subsystems, and thesystem will be ready for use in the sys-tem administrator’s environment.

More information on TemplateTree IIcan be found at http://isg.ee.ethz.ch/tools/.

THE ARUSHA PROJECT: A FRAMEWORK

FOR COLLABORATIVE UNIX SYSTEM

ADMINISTRATION

Matt Holgate, Glasgow University, andWill Partain, Arusha Project

The Arusha Project allows systemadministrators at modest-sized sites tocollaborate with one another on a largescale using the Internet. The presenta-tion focused on ARK, an XML-basedconfiguration language that can be usedto describe system administrationobjects. An object is anything an admin-istrator interacts with, including soft-ware packages, systems, and teams ofadministrators.

Partain’s presentation showed how asoftware package can be described bydifferent administrators using the con-figuration language. Each administratorbegan with different description fields,which include the package name, anyadministrator comments, the optionsused to build the package, and manyother fields. Partain’s presentation

showed how isolated system administra-tors can collaborate with a few other sys-tem administrators via the Internet toexchange their package descriptions.Each administrator can take the descrip-tions from others and plug useful fieldsinto his or her own description. As theupdated descriptions of the package areshared, the package description at eachsite is improved.

Partain’s examples showed how packagescan be parameterized and inherited byother packages. He also showed howmacros are created in the configurationlanguage and clean up the parameteriza-tion of an object.

The Arusha Project home page can befound at http://ark.sourceforge.net/.

HUMAN INTERFACE:

TIMELY SOLUTIONS

Summarized by Jeff Tyler

LEXIS: AN EXAM INVIGILATION SYSTEM

Mike Wyer and Susan Eisenbach, Impe-rial College

This paper won the Best Applied Paperaward.

Wyer and Eisenbach faced the problemof converting (temporarily) a largenumber of Linux workstations to ahighly secured configuration to allowstudents to take programming examina-tions while still maintaining networkconnectivity to a central server to collecttest data. After the exams are over, theworkstations have to be reverted to amore normal state. This transition has tobe done repeatedly over the course of asemester.

They solved this problem with a combi-nation of local and remote lockdowntools and a secured client-server config-uration built around SSH and ipchains.They took advantage of tricks likemounting the root file system withoutsuid bits active and heavy use of customrun levels. To activate a Lexis client one

Page 8: LISA '01 cover · cfengine. It can be viewed as a mix of dynamic and static schedules where the dynamic tasks are triggered by actions and the static tasks have precedence. One of

simply changes to run-level 4 and therest is automatic. The Lexis server, onthe other hand, is a dedicated box withmore traditional system security andmaintains “Lexis state” at all times.

Mike provided a lot of detail aboutdevelopment and testing of the system,building confidence with students andstaff, and discussed how they overcameproblems such as scaling and systemreliability. The Lexis system is in use atImperial College and may be obtainedunder GPL at http://www.doc.ia.ac.uk/~mw/lexis/.

JAVAMLM, A CUSTOMIZABLE MAILING-LIST

MANAGER

Ellen Spertus, Mills College; Robin Jef-fries, Sun Microsystems

The authors attempted to tackle a prob-lem with which all of us are familiar: thefact that a successful mailing list soongenerates volume levels that overwhelmsome users, who then drift away. Theystudied and rejected some traditionalapproaches such as static sublists anduser filtering and implemented adynamic sublist approach (threads).This approach presumes nothing on thepart of the mail client (e.g., no filtering)and allows the user access via a Webinterface to adjust subscriptions andpreferences. Javamlm works with qmailto do the heavy lifting (e.g., thread dis-tribution) behind the scenes.

This effort was strictly a prototype andthe authors intend to fold their workinto mailman, the GNU mailing listmanager.

GEORDI: A HANDHELD TOOL FOR REMOTE

SYSTEM ADMINISTRATION

Stephen J. Okay, Road Knight Labs,and Gale E. Pedowitz, Protura, Inc.

This was an interesting talk and a fasci-nating paper. The authors detail theirefforts to build a useful (and relativelysecure) sysadmin remote access tool ona Palm Pilot. Quoting directly from the

81February 2002 ;login: LISA 2001 ●

C

ON

FERE

NC

ERE

PORT

Spaper, “At base, GEORDI is a formsbased UI wrapper for an RSA/DSA sshconnection to a remote host runningsudo.” GEORDI also understands about60 UNIX commands and can recoverstate from previous sessions (scripts andcommands built with its command-builder tool).

I suffer from the same bias that I suspectmost of us do – if I can’t get to a shell,then I tend to suspect the utility of thetool. The authors, sysadmins themselves,have gone a long way toward addressingthis concern and producing a usefultool, given the inherent limitations ofthe platform. If you need to performhighly mobile systems administration,then GEORDI may very well be useful toyou.

GEORDI is available under GPL athttp://www.GEORDI.org.

ADAPTING THE COLLECTIVE: SHORT

TOPICS ON CONFIGURATION

MANAGEMENT

Summarized by Tim Smith

PELICAN DHCP AUTO-REGISTRATION SYSTEM: DISTRIBUTED REGISTRATION AND

CENTRALIZED MANAGEMENT

Robin Garner, Tufts University

Garner presented the DHCP registrationsystem implementation used at TuftsUniversity. The university has a class-Baddress space and about 9,500 systems.Six DHCP servers are used to servicethese machines. After experimentingwith DHCP registration systems from1997 to 1999, Garner was involved in thedevelopment of Pelican. Pelican wasdeveloped to address scaling issues nothandled by the other registration sys-tems.

Before registration the host is given anIP address from an untrusted portion ofthe address space. Pelican works byderiving a hosts MAC address when theyregister with the Web utility. The MAC is

put into the dhcp.conf file, and theDHCP service is restarted every fifteenminutes to pull in the new MACs. Whenthe new machine is restarted afterDHCP has its MAC address, it is able toobtain an address in the trusted addressspace and see the entire network andInternet. Pelican also has functions toadd and purge leases from the database,and to purge old machine registrationsfrom the database. Garner concludedwith performance results of Pelican.

A MANAGEMENT SYSTEM FOR NETWORK-SHAREABLE LOCALLY INSTALLED SOFTWARE:MERGING RPM AND THE DEPOT SCHEME

UNDER SOLARIS

R. P. Channing Rodgers and Ziying Sherwin, US National Library of Medicine

Network-shareable software has severalproblems. Packages are not independentof one another, and there is a need toallow host-specific “custom” packages.While locally installed software allowsfor host-specific packages, it is a largeburden to maintain.

After covering previous work in the area,Rodgers noted that depot was selectedfor the project because it is a simple for-mat, RPM was selected because its for-mat is open, and each format containsinformation that complements theother.

Rodgers’ presentation concluded withfuture work for the project. The RPMand depot functionalities should be cou-pled. In order for the combined formatto work well, the RPM database needs tobe modified to allow for dependencychecks across the network. The docu-mentation in the two formats shouldalso be merged. Other RPM enhance-ments that would require modifying theRPM code would also be useful.

Page 9: LISA '01 cover · cfengine. It can be viewed as a mix of dynamic and static schedules where the dynamic tasks are triggered by actions and the static tasks have precedence. One of

FILE DISTRIBUTION EFFICIENCIES: CFENGINE

VERSUS RSYNC

Andrew Mayhew, Logictier, Inc.

The native file transfer protocol in earlyversions of cfengine did not work. Toovercome this problem Mayhew putrsync in place to transfer files betweensystems. When the cfengine file transferwas fixed, the possibility of comparing itagainst rsync motivated Mayhew tocompare the two.

The experimental setup used twomachines on the same segment andcompared the performance of unen-crypted cfengine file transfers, encryptedcfengine file transfers, and rsync. Filestransferred in the experiments varied insize from 128 kilobytes to 2 megabytes.

In the experiments rsync performed bet-ter on large file transfers, while cfenginewas better transferring smaller files.

CFADMIN: A USER INTERFACE FOR CFENGINE

Charles Beadnall, W. R. Hambrecht, andAndrew Mayhew, Logictier, Inc.

Mayhew presented CfAdmin, a userinterface for cfengine designed to allowfacilities staff, release engineers, systemadministrators, and network operatorsto preview and edit information regard-ing systems and the software for them.Each of the groups needed to usecfengine for their work, so a commoninterface was created.

CfAdmin uses cvs for version control,cfengine for host management, and net-cool for network monitoring. Apachewith secureid is used for the interface.The interface allows the different groupsto perform host entry and software loca-tion entry (e.g., binary paths, etc.). Thesystem administrator then uses theinterface to configure a system beforedeployment using software informationfrom the release engineer. Facilities areable to use the interface to install themachine in the proper location.

A cfengine.conf file is generated byCfAdmin and then pushed out to all

82 Vol. 27, No. 1 ;login:

hosts based on the information entered.The automatic generation and distribu-tion of the configuration file eliminateshuman error while updating the config-uration file and makes cfengine easier touse.

WORK-IN-PROGRESS REPORTS

Summarized by Jeff Tyler

EMAIL REDO LOGS FOR USER-INITIATED

RESTORES

Rich Graves, Brandeis University

Email redo is more concept than code atthe moment but in use nonetheless. In anutshell, create two mail spools anddeclare one read only (at the user level).Allow the users to recover individualpieces of mail from the r/o spool afterthey zap them in error in the traditionalr/w spool. Supports UW-IMAP, cur-rently running on Linux. Roll the spoolson a systematic basis and expire the r/ospool at a reasonable point. A very cleveridea that only requires simple changes tothe local mailer to write both spools onincoming mail, some pointers to allowusers to recover their own files, an expiremechanism for the r/o spool, and a LOTof disk space.

BRINGING UNDO TO SYSTEM ADMINISTRA-TION: A NEW PARADIGM FOR RECOVERY

Aron Brown, University of California,Berkeley

Very much at the concept level at themoment, undo would provide the abilityto “recover” at will to almost any pointin time. Requires a LOT of prior plan-ning. The concept is based on the sys-tem-recover three Rs: rewind, repair,replay. It sounded very transactional butaimed at base OS, not databases. It willbe interesting to see where this one goes.

OPERATIONAL FAILURES IN LARGE-SCALE

INTERNET SERVICES

David L. Oppenheimer, University ofCalifornia at Berkeley

This case study – the first in what ishoped will be a series of case studiesinvolving large-scale Internet-based

enterprises – involved an Internet-basednetwork storage company. Among thefindings of interest was the fact thatmost failures were due to human error,which accounted for 33% of the eventsstudied. The next largest cause of failurewas listed as software failure and thiswas charged against the fact that soft-ware used was mostly home grown andbeing operated without rigorous changecontrol. Oppenheimer is actively seekingcompanies or organizations willing toparticipate in his study and guaranteesthat no one will ever learn your name ifyou sign up.

VERDAD

Jeff Kellem and Jeff Allen, tellme.com

Verdad is a central configuration store. Itunderstands inheritance, versioning, andis based upon MySQL and Perl. It con-trols software, DNS and DHCP data,and user ACLs. It has a r/w Web inter-face. Sounds useful.

ESTABLISHING AN ASSOCIATE OF APPLIED

SCIENCE IN COMPUTER SECURITY DEGREE

Will Morse, North Harris MontgomeryCommunity College

Morse is working on establishing thisdegree program in the Texas communitycollege system. He has a core curricu-lum, two OS-based courses, and a “lore”segment. He’s looking for ideas andwants to draw upon the experiences ofothers in this area. His goal is to producean entry-level security person who canmeet 80% to 90% of the securityrequirements that a small businessmight have.

INSTALLING LINUX IN UNDER THREE MINUTES

Paul Boven

The keywords for this talk are PXC,BIOS re-direction, serial console, andremote power control. Then stir inDHCP, TFTP, and NFS. Ever dig into theSun OS or Dec remote workstationbuild procedure? Then you understand90% of this WIP. The other 10% ismostly in getting PC hardware smart

Page 10: LISA '01 cover · cfengine. It can be viewed as a mix of dynamic and static schedules where the dynamic tasks are triggered by actions and the static tasks have precedence. One of

enough to boot off the wire. Paul seemsto have a very nice scalable version ofthis ever-popular hack. It used to take uslonger than three minutes to do this inAthena, but disks and the wire were bothslower then. Boven can be contacted [email protected].

THE CONDOR CLUSTER TOOL

Erik Paulson

It’s a bird, it’s a plane, it’s a big cyclestealer! Currently at 1000 CPUs andgrowing daily. It’s been adapted to inter-active use and taught some manners(from the standpoint of the peoplewhose cycles you are stealing). Currentlyuses only advisory locks and has someissues with reservation timeouts (theydon’t). In the words of it’s author, “It’snot too secure . . ..” V2.0 is under devel-opment and will address security withKerberos, have resource reservationtimeouts, and stronger-than-advisorylocking. Don’t let those spare cycles goto waste, folks.

A PORTABLE LINUX CLUSTER

Mitch Williams, Sandia National Labs

This was an extremely neat hardwarehack. Visualize this: a 4-banger Linuxcluster in a tiny (5.3” x 5.3” x 13”) cus-tom rack. Weighing 15 pounds, it has itsown little packing case that fits in theoverhead rack on a plane. Built aroundthe PC104 card buss system. Each CPUhas 128 Megs of memory, and the wholething is driven by a 50W power supply.They built it in a month from scratch forabout $5,000. Seems like Sandia neededa PORTABLE teaching and demo systemthat could do some serious parallel pro-cessing. You have to see this thing toappreciate it – it’s a jewel-like creation. Iasked Mitch what he’d change if he hadto do it again and he said, “I might makethe power supply a tad larger, like 75Wor maybe even 100.” Mitch asked that, inaddition to his coworkers at Sandia, Igive credit to the Parus and AdvancedDigital Logic corporations for all their

83February 2002 ;login: LISA 2001 ●

C

ON

FERE

NC

ERE

PORT

Shelp. This was the winning WIP and avery well deserved win in my opinion.

INVITED TALKS

CNN.COM: FACING A WORLD CRISIS

William LeFebvre, CNN Internet Technologies

Summarized by Joel Sadler

It was a presentation that few will forget.LeFebvre showed in great detail howCNN.com dealt with the traffic load cre-ated by the 9/11 tragedy. He opened thetalk by presenting some introductoryinformation about how the CNN.comoperations actually run. The group thathandles the hardware for CNN.com alsoperforms the same function for quite afew other Turner Web sites, includingWCW.com, SI.com, TBS.com, and Car-toonNetwork.com. All of the Web serv-ing hardware for the various sites isidentical, which allows for very simple“swings” of hardware among the varioussites when required for special events orother heavy traffic times.

Moving on, Bill presented a time linewith a stunning array of data about theirtraffic load. He showed that theirinbound HTTP requests doubled every7 minutes, with a starting metric of84,719 hits/minute at 08:45 (all timesEDT). By 09:00, they were already up to229,006 hits/minute! At this point, thetraffic monitoring software was shutdown for several hours to remove anyand all unnecessary load from the net-work and servers.

Meanwhile, the staff was scrambling toborrow servers from other sites so thatCNN.com could continue to serve theexponentially increasing load. Startingwith 10 servers at 08:45, they were ableto increase that number to 52 by 13:00,including an amazing swing of 20servers in a half-hour period. In addi-tion to the server moves, they were min-imizing the contents of the home pagein an attempt to meet the overwhelmingdemand. At their lightest point, there

were only 1247 bytes of HTML, withone small logo and a small picture.

Other interesting statistics of note:CNN.com’s previous high traffic recordwas on 11/8/2000, the day after the USpresidential elections. It reached a peakof about 1.2 million hits/minute for atotal of about 139 million page views.On 9/11/2001, CNN.com successfullyserved about 1.1 million hits/minute fora total of about 305 million page views,not including the several hours thatmonitoring was deactivated. Their bestguess as to the actual peak was about 1.8million hits/minute.

SECURITY FOR E-VOTING IN PUBLIC

ELECTIONS

Avi Rubin, AT&T Labs — Research

Summarized by Crystal K. Stockton

Rubin talked about his previous experi-ence with developing a system of e-vot-ing for a public election in Costa Rica,where the voters needed to vote in theirhome districts. The government hasalready provided public transportationfor those not living in their home dis-trict and has made the voting day anational holiday to ensure that everyonehas a chance to vote, yet e-voting wouldhelp those unable to travel.

Problems they encountered were thateach district had a different ballot, adultshad little experience with using a mouse,and the computers were limited to regu-lar voting districts. The written softwaretook into account the problem of severaldifferent types of ballots, and it was sug-gested to use touch screens or light pensto compensate for the mouse, but thegovernment did not have the extrafunds. Other problems during testingwere that all votes were recorded cor-rectly for the primary Republican andDemocratic candidates but wiped outthe votes for other candidates. A powersurge switched all votes from one candi-date to another, and without an audittrail it was impossible to redistribute thevotes.

Page 11: LISA '01 cover · cfengine. It can be viewed as a mix of dynamic and static schedules where the dynamic tasks are triggered by actions and the static tasks have precedence. One of

Rubin also outlined possible threats tothe system, social apprehensions, andtechnical issues. What types of conse-quences would there be if an attack weresuccessful? How motivated are theseattacks? What type of voting coercion,sale, or solicitations will there be? Is thisa secure platform to use? What will theavailability of the network be? How canit verify that a living person is voting?

ZOPE

Michel Pelletier, Digital Creations

Summarized by Armando Rojas-Morin

Zope is an open source Python-inspiredobject-oriented Web environment. WithZope, Web sites are developed throughthe Web itself.

One of Zope’s strengths is its securitysystem. Users have roles assigned, andactions are protected by permissions. Itprovides a file-system-like structure witha root folder. Permissions can be asignedto each subfolder. It’s a good way to del-egate. Because everything is done via theWeb, things are not actually files in thefile system.

Zope is more than just a server (HTTP,FTP, pcgi). Zope’s philosophy is thatdata, login, and presentation shouldeach be a different layer. This keeps all ofthe designers and developers happy bynever violating their layer.

Zope offers relational database integra-tion and content objects for the datalayer; Python, Perl, and SQL for thescripting layer; and page templates andDTML templates for dynamic presenta-tion.

2001: A COMMUNICATIONS ANNIVERSARY

Peter Salus, Matrix.net

Summarized by Joel Sadler

Peter Salus gave a wonderfully humaneand informative talk. He discussed thetechnological distance covered, prima-rily in the 20th century, to get us to thecurrent level of communications dexter-

84 Vol. 27, No. 1 ;login:

ity. Peter also made mention of some ofthe leading innovations necessary tosuch advancement, such as the mechani-cal calculator, the telephone, andtransatlantic radio messaging. He linkedseemingly unimportant items togetherand illustrated their relevance in furtheradvancing communications capabilities.

High points included the first transat-lantic radio message in 1901; the firsttransistor in 1951; Clarke’s short story“The Sentinel” in 1951 and its movieadaptation 2001: A Space Odyssey in1968; the birth of UNIX and of LinusTorvalds in 1969; Lyons’ explication ofthe UNIX kernel in 1976; and the releaseof both PGP and Tim Berners-Lee’s firstHTTP work in 1991.

IF I COULD TALK TO THE ANIMALS: WHAT

SYSADMINS CAN LEARN ABOUT DIAGNOSTIC

SKILLS FROM ANOTHER PROFESSION

David N. Blank-Edelman, NortheasternUniversity

Summarized by Crystal K. Stockton

Blank-Edelman began by comparing hisprofession to that of mechanics anddoctors and pointed out several reasonswhy their work is completely differentthan that of a system administrator. Hewent on to say that a more accuratecomparison would be to the professionof veterinarians. The reason comparinga system administrator to a mechanic isnot accurate is that mechanics canremove and replace parts in determininga problem and fixing it. Their worlddoes not fluctuate much, and they havevarious instruments available to helpwith diagnostics. Blank-Edelmanbelieves comparing system administra-tors with doctors doesn’t work either,because doctors treat others of the samespecies as themselves. They have the lux-ury of communicating with theirpatients. Veterinarians have the mostsimilar profession to systems adminis-trators because they work with a largevariety of species, cannot easily remove

and replace parts, and collect diagnosticinformation from a third-party source.

Blank-Edelman then talked about differ-ent types of decision-making. One typedescribed in depth was deductive logicalthinking. This is a classical approach todecision-making, which is usually whatpeople are taught at an early age.Another type of decision-making is nat-uralistic decision-making where theenvironment influences your decisions.Types of environmental variables aretime, pressure, high stakes, and experi-ence.

After explaining different approaches todecision-making and explaining the rea-sons for trying to mirror sysadmin anduser relations, Blank-Edelman showed aclip from the movie Doctor Dolittle.While watching this clip, he explainedhow a sysadmin’s job is similar to a vet-erinarian’s. This was a fun and light wayto compare the two professions.

To review Blank-Edelman’s slides andlearn more about his research, refer tothe Web site http://www.otterbook.com.

THE PROBLEM WITH DEVELOPERS

Geoff Halprin, e-smith, Inc.

Summarized by Yolanda Flores-Salgado

“There is a problem with developers.They don’t develop maintainable, pro-duction-ready, manageable code.”

Maintenance is 70% of the softwaredevelopment lifecycle, but most devel-opers can’t maintain their software.They suppose and assume a lot, don’tconsider changes, don’t provide enoughdocumentation, and so on. Developersneed to be re-educated, but sysadminscan’t re-educate developers.

The only way is not to accept the prod-uct if requirements are not complete. Ifan application doesn’t satisfy our needs,don’t accept it, don’t use it. Sometimesthis is not possible, but if we could do it,our lives would be better.

Page 12: LISA '01 cover · cfengine. It can be viewed as a mix of dynamic and static schedules where the dynamic tasks are triggered by actions and the static tasks have precedence. One of

An application has a life cycle: install,configure, manage, monitor, build,update, and de-install.

Developers need to know configurationstandards. We need standard configura-tion files and logs. We also need a parti-tioning of file types – not all files areequal, and to improve file access, eachlocation should be specified separatelyand set by the administrator.

Applications should be separated fromsystem areas, and developers should giveus the choice whether to isolate theapplication in a simple hierarchy and tohave separated data areas.

Configuration management is veryimportant. Proper configuration gives usthe choice to manage the application’sbehavior. For simplicity of use, every-thing should go in one file if possible.

Sysadmins need control over:

■ How to stop and start the application

■ Application requirements ■ User Management

Sysadmins need monitoring applica-tions. We need standard logs, and log-file management (files rotating,configuring logs, etc.). Sysadmins alsoneed backups. How easy is it to backupand restore the application?

Error handling – does the applicationtrap potential errors? Does it reporterrors in a consistent format?

Sysadmins need installation instruc-tions. Halprin said, “It is a sad statementthat most software installations are doneby executing the vendor installationscripts without question.”

An application should provide docu-mentation to install, de-install, andupgrade it. We need to understandinstallation procedures, but, mostimportantly, we need to be able to con-trol them.

85February 2002 ;login: LISA 2001 ●

C

ON

FERE

NC

ERE

PORT

SPHP FOR SYSTEM ADMINISTRATION

Shane Caraveo, ActiveState

Summarized by Yolanda Flores-Salgado

PHP is a Perl/C-like scripting languagedesigned specifically for the Web. It canbe used on almost all platforms (e.g.,UNIX, Windows, and MacOS), and itcan be used either as a stand-alone lan-guage or as embedding SGML, XML,ASP, or JavaScript. PHP provides a lot ofextensions supporting all of the com-monly used databases (postgres,MySQL, Generic database, oracle), sys-tem protocols, and distributing process-ing.

PHP is easy to use and learn for sysad-mins, especially if they are familiarizedwith C or Perl, and because it is Webbrowser-oriented, it can run anywhere.Web interfaces are also easy with PHP.

PHP can be (but is rarely) used for sys-tem administration. Since GUI inter-faces are easy in PHP, PHP can be usefulin building interfaces to delegate somesysadmin duties to non-sysadmins.

Some interesting PHP-related sysadminprojects are:

■ PhpMyAdmin (PHP and MySQL –provides full MySQL admin capa-bilities)

■ LDAP Admin (LDAP support viaOpenLDAP or Netscape SDK)

■ PhpQLAdmin (supports qmail,LDAP)

■ Mailing List Admin. (simpleEZMLM interface, but it could bebetter using EZMLM and MySQL.PHP provides PHP classes; easy touse. Lacks most options formake/edit lists.)

■ proBIND (kindly interface forBIND config)

■ PhpCron (simple Web-based cronserver in PHP, integrated withcrond)

Some PHP resources are: http://php.net;http://SourceForge.org; andhttp://aspn.ActiveState.com.

RULES OF THUMB OF SYSTEM

ADMINISTRATION

Steve Simmons and Elizabeth Zwicky .

Summarized by Tim Smith

The presentation was a collection of thewit and wisdom of the system adminis-tration field. “The only thing morefrightening than a programmer with ascrewdriver or a hardware engineer witha program is a user with a pair of wirecutters and the root password” is repre-sentative of the slides used during thepresentation. After each slide Simmonsor Zwicky would tell a quick storyrelated to the slide and would point outthe underlying rule of thumb or greattruth that could be used by systemadministrators in their jobs every day.The slides for this presentation are notavailable online. The material used bySimmons can be found in his sigfile col-lection at http://www.nnaf.net/~scs/Fun/sigfiles.html.

WHAT SYSADMINS NEED TO KNOW ABOUT

THE NEW INTELLECTUAL PROPERTY LAWS

Lee Tien, EFF

Summarized by Josh Simon

The short answer to the title, accordingto the speaker, is “a lot.” He provided ageneral overview of the issues, but whenin doubt, always contact your own attor-ney.

The theme of the legislation of late hasbeen to figure out who controls thetechnology. Copyright law provides thecreator of a work or expression fixed insome tangible medium, including elec-tronic media such as RAM and diskstorage, the right to exclusively copy, sell,and distribute their work and the rightto authorize others to do so. Copyrightinfringement is when someone does anyof this without authorization. There aretwo kinds of infringement: direct andindirect. Direct infringements are thosewhere you yourself are the violator. Indi-rect infrigements are when there is adirect infringement and you’re involved

Page 13: LISA '01 cover · cfengine. It can be viewed as a mix of dynamic and static schedules where the dynamic tasks are triggered by actions and the static tasks have precedence. One of

intermediately. There are two types ofindirect copyright infringements. Thefirst is contributory, where you condoneor help the direct violator, have knowl-edge (which has been extended to meanboth “you know” and “you have reasonto know”), and materially contribute tothe violation, which includes the controlof the facilities or the systems. The sec-ond type is vicarious, where directinfringement affects the right to controland leads to a direct financial benefit forthe vicarious infringer. The example isof a tenant/landlord relationship. Sincefinancial benefits are typically not pres-ent for system administrators, vicariousinfringement probably doesn’t appply tous. However, knowledge or reason-to-know do not apply to vicarious infringe-ments.

So what can we as system administratorsdo? In smaller environments, we canavoid infringements. Unfortunately, thisdoesn’t scale well. There’s the so-calledBetamax defense, which says if some-thing can be used for substantial non-contributory use it’s okay – but thecourts aren’t buying this argument yet,because it’s only been applied success-fully thus far to contributory, not vicari-ous, infringement.

What about Napster? They should haveknown there was infringement going on,and they provided the software andhardware (servers), so they’ve got con-tributory infringement. They also per-formed direct violations, and affectedthe right to control (vicarious) and costthe copyright owners revenue (vicari-ous). Even if only contributory infringe-ment is involved, you can’t foist it offand say it’s someone else’s problem onceyou have knowledge of it. So the advicehere is to take cease-and-desist lettersvery seriously.

What about new legislation? Some caselaw shows that some knowledge is essen-tial. Title II of the Digital MilleniumCopyright Act (DMCA) provides safe

86 Vol. 27, No. 1 ;login:

harbors for ISPs and other providers,though the safe harbors are very compli-cated. A safe harbor provides immunityfor monetary damages only and isintended to limit the legal exposure ofthe provider. There are four of themdefined: transitory network passage,where all you do is deliver bits from oneplace to another, as in the Usenet model;system and caching, where you providethe hardware and OS but no monitor-ing; user-stored files, where you providethe disk space; and search-and-retrievaltools, such as Yahoo! The definitions andrequirements and exceptions are all verycomplex, written in legalese, and there’svery little case law behind them. In gen-eral, though, you have to meet the spe-cific criteria for a safe harbor: you musthave an anti-infringement policy,accommodate and not interfere withstandard technical measures to protectcopyrighted works, and comply withnotice and takedown requests. Unfortu-nately, some of these terms, such as“standard technical measures” and “anti-infringement policies,” are legallyambiguous.

The big question becomes who controlsthe technology of the Internet? TheRIAA and others want to control itbecause it can be used to copy and dis-tribute works to which they own thecopyrights. The DMCA, in the opinionof the speaker, is a strategy to controldevices, and it doesn’t provide excep-tions like the Betamax rule; so it requiresthe right to control access and to makedevices to circumvent access controls.

HARDENING WINDOWS 2000

Phil Cox, SystemExperts Corp.

Summarized by Jason Wertz

For those of you out there who have todeal with Windows 2000 servers, there isalways the question of how to protectyour servers. What can you do as anadministrator to make sure someoneelse’s system is more attractive to a

hacker than yours is? Phil Cox([email protected]) had manyof the answers in his presentation,breaking the topic down into severalparts. First, determine the purpose ofthe server. Second, what types of physi-cal security are necessary for the server.Third, what should be done as the OS isinstalled to promote tight security?Next, what can be done to tighten thesecurity on the server after it has beeninstalled? Finally, he suggested ways totest the servers after they are lockeddown to make sure they are as secured asnecessary.

Before a new Windows 2000 server is setup, its purposes must be established.Determine what services will be offered,which ones won’t be offered, whichcomputers the server is allowed to talkto, what domains and workgroups theserver is part of, and what protocols theserver will be using. If these answersaren’t known, the server should not beset up.

After the server’s purpose is known, oneshould decide how to secure it physi-cally. At a minimum, case locks shouldbe used, EEPROM passwords should beactivated, and the hard drive should bedesignated as the first boot device ifremovable media is usable, and theserver is publicly accessible. If the systemis critical or highly sensitive, cagesshould be used as well. Other methodsof physical security are up to the admin-istrator.

Once the methods of physical securityhave been established, the installationcan be done. When installing, use NTFSas the file system, set a good admin pass-word, and install only the required net-work services. Do not upgrade fromolder windows servers if possible.

After installation is complete the admin-istrator needs to figure out what servicesare actually running on the server. Foreach service to be kept, startup options

Page 14: LISA '01 cover · cfengine. It can be viewed as a mix of dynamic and static schedules where the dynamic tasks are triggered by actions and the static tasks have precedence. One of

need to be set. Unnecessary servicesshould be disabled or deleted. Deletionis the preferred method since a deletedservice can’t be restarted by a hacker,though certain services are difficult orimpossible to delete. System policiessuch as password policies, account lock-outs, auditing policies, user rights,startup/shutdown policies, etc. shouldbe set at this time. Directory permissionsshould also be checked. Networkingmust be looked at, and filtering methodsshould be used to protect various usedand unused ports. Time synchronizationshould also be used. Next, the activedirectory must be secured. Finally, installservice packs and hot fixes.

Once the system seems secure, it mustbe tested out for security holes. Thereare many commercial tools that can beused for this. Unless great care has beentaken, these final tests will likely showthat there are a few bases that still needto be covered. It is much better to findsecurity holes at this stage rather thanwhen a hacker breaks into the systemand exposes them.

To download the white paper that is thebasis for this presentation and containsall the details for each of these steps, goto: http://www.systemsexperts.com/literature.html and download “Harden-ing Windows 2000”(tutors/HardenW2K101.pdf). There should be a version 1.2 of thisfile soon.

SANS AND NAS

W. Curtis Preston, Storage Designs

Summarized by Mark Logan

Preston focused on the differing tech-nologies of SANs (Storage Area Net-works) and NAS (Network AttachedStorage), and the strengths and weak-nesses of each. He concluded with a lookat the future of the two technologies andhow they will be affected by the adventof NFS v4 and NDMP (Network DataManagement Protocol).

87February 2002 ;login: LISA 2001 ●

C

ON

FERE

NC

ERE

PORT

SThe first segment of the presentationdiscussed the actual architectures ofSANs and NAS. SANs is a fiber-channelnetwork of RAID devices attached to ahost machine. NAS, on the other hand,is an appliance (often called a “filer,” aterm coined by Network Appliance)attached to a LAN. NAS typically useseither NFS or CIFS as the network filesystem. Preston mentioned that runningSANs behind NAS is becoming moreand more common.

Preston attributed several advantages toSANs, such as reliability due to the abil-ity to design systems with no singlepoints of failure between storage arrays,but he was also fairly critical of the tech-nology, pointing out its very high costand its difficulty to administer.

Throughout the talk, Preston was muchmore enthusiastic about NAS technol-ogy. He was the first to point out thatlast year, he categorically warned againsttrying to run an RDBMS on a NASappliance. Now, he is a proponent of thepractice, after seeing numerous successstories. The advantages he attributed toNAS included ease of administration, itsgenerally superior speed measuredagainst equally priced SAN and localdisk solutions, and many of the “good-ies” being included by NAS manufactur-ers, such as snapshots and trulyadvanced file systems.

He was quick to mention that a NASsystem does suffer all the flaws of thenetwork file system it implements, andthat NAS presents some problems in therealm of backups. However, NDMPpromises to fix many of these problemsby allowing backup from filer to self,filer to filer, filer to backup server, andserver to filer.

In conclusion, Preston declared that thechoice between SANs and NAS waslargely dependent on the particular situ-ation. The overall message of the talk,however, seemed to be that NAS was

really maturing and becoming moreaffordable, but SANs still offered morein the realm of high availability andperformance.

NETWORK/SECURITY TRACK

WHITHER END-TO-END: PLACING

BANDWIDTH AND TRUST AT THE EDGE

Gordon Cook, The Cook Report

Summarized by Jin-ping Wan

Gordon Cook, the author of The COOKReport on Internet, a monthly newsletteron Internet infrastructure development,calls for customer-empowered networkinfrastructures, in which customers con-trol bandwidth and other resources overtelco-empowered network infrastruc-tures. Today’s Internet is dominated by afew supercarriers; in the interests of thepublic, this should change to a scenariodominated by customers’ networksintersecting global and other local net-works. This is analogous to computing,which was dominated by large main-frame computers 40 years ago, andchanged to personal computing due tothe proliferation of mini-computers inthe 1970s followed by the PC. Gordonuses Canada’s CA*net4 to illustrate anedge-controlled infrastructure that hascustomer-owned networks with fiberbandwidth to the users.

CRYPTO BLUNDERS

Steve Burnett, RSA

Summarized by Mike Sconzo

Cryptology can be a powerful and secureway to send data, but it must be usedproperly. Algorithms can range fromsimple to complex, from almostunbreakable to easily broken. Whateverthe algorithm, the use is the same:encrypt data and keep it safe fromattackers. Five blunders were introducedin this presentation.

One of the newest blunders is to declareone’s algorithm unbreakable. Severalcrypto schemes have done this and as a

Page 15: LISA '01 cover · cfengine. It can be viewed as a mix of dynamic and static schedules where the dynamic tasks are triggered by actions and the static tasks have precedence. One of

result were quickly broken having gar-nered the attention of people wanting tobe “the one who broke the unbreakable.”The newest trend is using a securityproof to prove mathematically that youralgorithm is “perfect.” One such case wasthe Atjai-Dwork crypto system; the algo-rithm was “proved” to be unbreakable in1997 and broken in 1998.

The second pitfall is “worshiping at thealtar of the one-time pad.” Since theone-time pad crypto system was provedto be perfectly secure by Shannon, sev-eral companies and individuals havetried to use this to their advantage. Theproblem resulting from adopting theone-time pad to suit specific needs wasdemonstrated by Microsoft in 1998.Microsoft created a product that usedthe one-time pad as part of an algo-rithm, but the pad was used twice. Thisallowed people to figure out what thepad was and attack the traffic.

Another problem arises from not usingthe best available algorithm, which leadsto the commonsense question, whyshould an algorithm be used when it isknown to be insecure? This also leadsinto the next blunder: an incorrectimplementation of an algorithm. Thiswas illustrated by a story about a manwho called with a complaint about theRSA he had implemented. No matterwhat he did, his message alwaysencrypted to itself. When asked what hewas using as his exponent he replied “1”.In the formula c = mx mod n, if 1 is cho-sen for x, then m will always equal c.Lesson learned.

Finally, blunders can stem from anincorrect implementation or just poorsecurity policy. This happens when you“don’t protect the key.” If the private keyis not protected, any crypto schemebecomes near trivial to break. Somefamous instances of not protecting thekey arose both from Microsoft andNetscape.

88 Vol. 27, No. 1 ;login:

It is important to keep an eye on thecrypto system that you are currentlyusing. Make sure that it is not out ofdate, you have a good implementation,the private key is kept private, the algo-rithm is used correctly, and it is a goodsystem to use. Finally, even if all thoseare present, the issue becomes one oftrust. With third-party systems in placeto verify identities, who has to worryabout invalid certificates? Then again,maybe we should worry.

HOW NOT TO CONFIGURE YOUR FIREWALL

Avishai Wool, Lumeta Corp.

Summarized by Joel Sadler

Wool presented a fast-moving talk onfirewall configs, mostly centered aroundCheckpoint Firewall-1. He opened withsome overall policy auditing conceptsbefore moving to common misconfigu-rations. Unsurprisingly, the most com-mon errors that he’s seen are allowanceof all DNS traffic (both inbound andoutbound), improperly controlledICMP, and general problems with mis-use of the “ALL” directive. Using exam-ple client data, he showed specificfirewall configurations with seriousproblems. His strongest recommenda-tion to firewall administrators was tokeep their policies as simple as possible.His data showed that as firewalls grew incomplexity, not only were new vulnera-bilities introduced, but the danger ofexploiting existing vulnerabilitiesincreased.

THE FUTURE OF COMPUTER SECURITY

Moderator: Marcus Ranum, NetworkFlight Recorder; Panelists: Tom Limon-celli, Paul Proctor, Anne Benninger,John Flowers, and Steve Atcheson

Summarized by Mike Sconzo

The question posed to each member ofthe panel was “where will we be in 3, 5,and 10 years?” One believed that com-puter security will get much worsebefore it gets any better. Although quitea few people think that the majority of

the problems will be partially if notcompletely solved within 10 years. Thisis because we should have better toolsand more knowledge about the prob-lem(s) we are trying to solve.

It was pointed out that we are headed inthe right direction, and with theincreased realization by managementthat security is important, there will bemore spending on security-related tech-nology. The industry can then producesuch true security products as a (nearly)self-correcting operating system andproducts that enhance security ratherthen just acting as burglar alarms. Theseideas about security and securityenhancements will eventually trickledown to IT people, and this will eventu-ally help with the current state of secu-rity.

No matter how great the technology is,however, we will still have problems dueto human error. This can be mitigatedthrough education. Consolidation ofproducts/ideas is also anticipated so thatproducts will be easily scalable.

The need for a standardized system ofevaluation was also brought to light.Some panelists agreed that the businesssector (e.g., insurance companies) wouldinfluence this. It might eventuallybecome possible to buy network securityinsurance. The insurance would bepriced according to how secure the net-work was, and this would lead to a secu-rity rating system. Introduction of VISAcompliance standards might be anotherway to achieve this.

Several other issues were discussed, suchas Public Key Infrastructure, authentica-tion, and risk measurement. Everybodyseemed to agree that “management isbad, insurance and government stan-dards are good, and PKI is dead.”

Page 16: LISA '01 cover · cfengine. It can be viewed as a mix of dynamic and static schedules where the dynamic tasks are triggered by actions and the static tasks have precedence. One of

GURU SESSIONS

AFS

Esther Filderman, PSC, and GarryZacheiss, MIT

Summarized by Mark Logan

The AFS session consisted of questionsabout bugs in various AFS implementa-tions, questions about AFS setup andadministration, and commentary aboutthe state of the AFS community. Thediscussion took place in a packed roomcontaining everybody from longtimeveterans of AFS to people who were justcurious about AFS.

The first round of questions addressedbugs in certain setups that seemed to becaused by Jumbograms, which are on bydefault in AFS. Jumbograms can speedup AFS communication in some cases,but the consensus was that they shouldbe the first thing to go anytime strangebugs start to show up.

The discussion then turned to issues ofAFS performance, including caching todisk and/or memory, and how AFSperformance compares to that of NFS.One attendee told of having dramati-cally improved workstation performancewith a relatively small memory cache,while another shared his experience ofrunning AFS with a 2GB memory cacheon an E10K. Allegedly, it was rathersprightly.

Discussion of backup was bound tocome up sooner or later, and it centeredaround alternatives to butc, the mostcommon backup tool used with AFS.However, the only tool about which any-body had anything good to say wasTivoli Storage manager. A few folks whowere unconcerned with preserving AFSACLs reported success using tar andcommercial backup products.

Toward the end of the session, the his-tory and future of AFS came up, andEsther and Garry fielded questionsabout the Transarc implementation, the

89February 2002 ;login: LISA 2001 ●

C

ON

FERE

NC

ERE

PORT

Sprogress of OpenAFS (which is nowreportedly quite stable), and the direc-tions that AFS may take in the future.There was some speculation about thepossibility of an AFS Foundation, butthe gurus were not optimistic aboutsuch an organization actually beingfounded in the near future.

INFRASTRUCTURE ARCHITECTURE

Steve Traugott, TerraLuna, LLC

Summarized by Tim Smith

Topics suggested for discussion werelarge systems beyond credible manualreach; industry acceptance and under-standing of infrastructure architecture;notations, semantics, and type checkingin infrastructure architecture; organiza-tional infrastructure; and turn-key man-agement solutions. However, organiza-tional infrastructure and notations,semantics, and type checking were notdiscussed before the sessions ended.Traugott began the discussion on largesystems by saying that without automa-tion and tools there is an infrastructuresuch that an infinite number of systemadministrators could not administrate it.The key to managing such large infra-structures is centralized management ofthe systems and network. The discussionthen shifted to tools used to centrallymanaged network hardware. On thesoftware and configuration side, how toavoid changing the infrastructure manu-ally was discussed. Manual changesshould never be made even though thepressure to immediately fix a problem isgreat. Instead, the changes should bemade using the tools available. If this isdone correctly, a machine can be refor-matted and all of the customization afterthe base operating system installationcan be recovered automatically.Industryacceptance was the next topic covered.The primary problem faced in this areais describing to recruiters and bosseswhat an infrastructure architect does. Aninfrastructure architect is concernedwith reducing the cost of ownership of

managed systems through careful plan-ning of the infrastructure of the com-puting environment. They are typically asenior system administrator who cancode well and whose intent is to buildthe view of a single enterprise architec-ture. Infrastructure architecture can bethought of as something to do after sys-tem administration. An infrastructurearchitect is needed to make the designdecisions because if these decisions arenot made by someone assigned specifi-cally to the task then they are made dur-ing triage situations in emergencies, andthis is clearly evident in the resultinginfrastructure.Turn-key managementsolutions for infrastructure architecturedo not exist at the moment. The bestsolutions would be written by vendors,but those would not be acceptable inheterogeneous environments. Any solu-tion produced for a site tends to be sitespecific and not really sharable. TheArusha Project (http://ark.sf.net) wasmentioned as an effort to make site-spe-cific efforts sharable between sites usingtheir XML-based configuration lan-guage. Standardizing GNU tools andinfrastructure policies are other ways toallow sharing.More information can befound on http://infrastructures.org.

PKI/CRYPTOGRAPHY

Greg Rose, Qualcomm, Inc.

Summarized by Mark Logan

The PKI guru session started off withthe announcement that the AES hasbeen approved. Rijndael, the winningcipher submission, was accepted withfew modifications. Rose expressed hissatisfaction with the committee’s choiceand cited Rijndael’s propensity forencrypting very quickly as its biggeststrength. He added that there was hardlyany doubt that all of the ciphers underconsideration were secure, so factorssuch as speed were given greater weight.Discussion focused for a while on prob-lems with current PKI implementations.Top on the list of gripes was the diffi-

Page 17: LISA '01 cover · cfengine. It can be viewed as a mix of dynamic and static schedules where the dynamic tasks are triggered by actions and the static tasks have precedence. One of

culty in issuing certificate revocations.Rose and a few attendees explained sev-eral different approaches to addressingthis problem. The simplest one was tosimply set certificates to expire relativelyquickly. One of the more interestingproposals involved storing part of a keyon a trusted server, so that to sign orencrypt documents, users would have topass part of the computation to theserver since they would hold only a por-tion of the key. Then, instead of revok-ing certificates, an administrator wouldsimply remove the key material from thetrusted server, and the relevant userwould not be able to sign or encryptdocuments. In the second half of thesession, several attendees had questionsregarding the state of export regulations.There was some trepidation in the roomabout what the US legislature could beexpected to do with regards to cryptog-raphy export regulations in the wake ofSeptember 11th. Rose felt that regula-tions were still quite relaxed comparedto those of several years ago and thatthere was little cause to worry for thetime being. He justified his stance byarguing that export regulations werenever meant to keep cryptography fromleaking across US borders but, rather,were meant to keep any non-US busi-ness from using US cryptographic tech-nology to do business.

WRITING PAPERS FOR USENIX REFEREED

TRACK

Tom Limoncelli, Lumeta

Summarized by Josh Simon

Tom noted that publishing a paper wasgood since it helps the community andcan change your career (allowing forboth peer and management recognition,and providing ammunition when yourboss needs to justify your next raise).

How does one start writing a paper? Theadvice here is to write what you know.Are you doing anything to make yourlife easier? Automating a task? Writing a

90 Vol. 27, No. 1 ;login:

cool tool? Working on a neat project?Providing a case study, whether positive(“Here’s what we did, and it worked”) ornegative (“Here’s what we did, how itbroke, what we did to fix it, and what weshould’ve done to begin with”)? Askingyourself, “What have I done that nobodyelse has” is an excellent way to start.Then following up with the terms andconcepts and a statement of the prob-lem, its scope, and how you solved itprovides a good basis for your paper.

Don’t forget to survey the literature.With the publication of Selected Papersin Network and System Administration,or “The Best of LISA” as it’s been called,there’s a single place to go to find refer-ences. Add to that the resources availableto all USENIX members on thehttp://www.usenix.org/ Web site andyou’re definitely off to a good start.

Tom also discussed the evaluationprocess, based on his experience servingon or alongside several program com-mittees. The readers consider whetherthe paper is enduring and whether it canresult in a good presentation. Papers areevaluated on several criteria, includingthe technical quality of the work, thepresentation of the paper, whether itadvances the state of the art of systemadministration, and whether it’s relevantfor LISA or somewhere else.

If your paper is not accepted, don’t con-sider that anything more than a momen-tary setback. Papers are usually returnedwith commentary that explains why itwas not accepted and suggestions onwhere to submit it (if not to LISA nextyear), along with commentary on thepaper’s quality, presentation, and so on.

If your paper is accepted, meet yourdeadlines. Work with your shepherd,whose job it is not only to nag you tomake deadlines but also to help you byproviding constructive feedback on whatis good and what isn’t. The shepherd is aresource to help make your paper the

best it can be. Remember that they havetheir own lives to live but that they arewilling to help you out – just don’tdeliver a draft and expect same-dayturnaround.

Some additional commentary:

■ Both proofread and spell check yourpaper. Have someone else proofreadand spell check your paper. You’retoo close to it by the final submis-sion deadline, so another set of eyescan help a lot.

■ Do your presentation beforehand.Practice in front of a mirror, or pres-ent it to your team, department, orcompany, or to your SAGE localgroup.

■ Give away the ending early. You’renot writing a mystery novel; it’s arefereed paper. You should identifythe problem you’re trying to solveand how you solved it in theabstract, the introduction, or both.You should also spell that out earlyin the presentation.

■ In your presentation, considerdemonstrating the tool (if yourpaper is about a cool new tool).Also, consider what your audio/visual needs will be: laptops, trans-parency projector, microphones, anyspecial needs. The AV team needs asmuch advance warning as possible.

Finally, we discussed some paper ideasand how best to present them for futureconference paper tracks.

WORKSHOP SERIES

CFENGINE

Moderator: Mark Burgess

Summarized by Mark Burgess

The cfengine workshop, led by MarkBurgess, attracted 21 participants fromall backgrounds to discuss the currentand future developments in cfengine v2.Among the highlights: a discussion,prompted by Steve Traugott, as towhether it is best to determine a config-

Page 18: LISA '01 cover · cfengine. It can be viewed as a mix of dynamic and static schedules where the dynamic tasks are triggered by actions and the static tasks have precedence. One of

uration as a sequence of known steps(Steve’s view) or as the specification of afinal state (Mark’s view) – the two arenot always equivalent; Paul Andersontalked about the need for even higher-level specification languages at the enter-prise level; Andy Mayhew and ChristianPearce discussed how cfengine could beenhanced for the enterprise with ancil-lary tools like CVS and LDAP; and Mar-tin Andrews talked about using cfengineon Windows NT/2k. The workshop pro-duced a stimulating discussion, which issummarized at http://www.cfengine.org.

METALISA

Tom Limoncelli, Lumeta, and Cat Okita,Earthworks

Summarized by Josh Simon

The MetaLISA workshop about manag-ing system administrators first discussedthe question of how to provide motiva-tion to help retain quality personnel. Wedecided that providing a good workenvironment without major stressorswould be better than just throwingmoney (salary) at the problem, thatauthority and responsibility should bothbe well defined, and that resources haveto be made available to handle prob-lems.

Next we discussed the different typessystem administrators: the “work 9 to 5,get a check, leave work at work” typeand the “computers are my life so I playon them at home, too” variety. Onemanager organized his group so the for-mer type was given the trouble-ticketqueue processing and the latter wasgiven more of the infrastructure andhard install problems.

We then discussed career path issues.Several companies now have multiplepaths and levels, such as team lead, proj-ect lead, and assistant manager, eachwith appropriate and well-defined levelsof expectations, evaluation scores orresults, experience, requirements, and soforth. Providing different levels of

91February 2002 ;login: LISA 2001 ●

C

ON

FERE

NC

ERE

PORT

Sresponsibilities, independence, author-ity, and even money (base pay increase)on a path for both technical and man-agement types, junior to senior, seems towork well. Even better is when there arewell-defined criteria for promotion andlateral transfers between tracks. Remem-ber, however, to provide allowances forexceptions or case-by-case waivers inyour written policies.

Next we covered professionalism. Somepeople lie on their resumes. Some peo-ple wear inappropriate clothing (suit ort-shirt) to an interview or to the jobitself, and don’t alter their clothingchoices even when informed to do so.Some people don’t understand the con-cept of punctuality. Some people don’tknow how to be tactful, to provide theright level of information, or even to say“I don’t know” to the customer. Onetopic of discussion was how to educatethese people to improve these skills.Information sharing – such as emaillists, databases, and even IRC channels –helps teams to share knowledge, cross-train people, and provide a way to leteveryone contribute. Admitting whenyou’re wrong builds trust for whenyou’re positive that you’re right. Encour-aging people to ask for help can work,but so can offering help and asking peo-ple if they need help. However, the inse-cure may not respond or take you up onthe offer. For these people, it may help topresent a situation as a “show me whatyou did so I can learn.” Don’t use killingstatements (such as “you’re wrong”) butask leading questions (“what if”).

Next we looked at determining whatinformation is important (to share) andwhat is not (to keep political falloutfrom the team)? One technique is tohave a staff meeting and say, “Here’s theimportant stuff” and then let folks leaveif they don’t care about the politics. It’snecessary to get folks to realize “best”isn’t always “right” and that politics canoverride the right technical basis. The

team needs to be aware that there arepolitics even if they don’t know thedetails. However, email is often not thebest medium for this; in-person andtelephone contact may be a better (or, atleast, a good supplemental) way to con-tact and inform people. Also, giving peo-ple the framework to put the details inand answering their questions is good.Some people, though, just don’t careabout the political issues. Sometimes,having face-time in meetings with yourpeople and the lord high political muck-ety-muck may be useful.

Many people can follow a checklist anddon’t have problem-solving skills. Howdo you teach them to acquire the skills?Problem-solving is linked to curiosity,background, and experience. Teachingpeople skills is important. Using child-raising techniques, such as brainstorm-ing with a timer, may be helpful. Again,you have to be careful to ask leadingquestions and not use killing statementsthat make the other person defensive.People need to remember to look at thebig picture in order to make informedbig-picture decisions so questions ofdirection get addressed within thegroup. Also, the problem and scope needto be explicitly defined, because that setssome limits. Finally, the instruction ordetail level of the recipient may be rele-vant; instructions to senior people maybe much less detailed than instructionsto juniors.

The next major discussion topic was thebalancing act between technical andmanagement responsibilities and tasks.Some of the tricks include allowing thepeople who report to you to assumeyou’re still technical, knowing that thetheory can be as good as the technicaldetails, keeping yourself informed aboutmajor issues, and using one day a weekas a technical day for working on smallprojects. Also, if you only rise to thepoint of comfort, whether that’s teamlead, division lead, project manager,

Page 19: LISA '01 cover · cfengine. It can be viewed as a mix of dynamic and static schedules where the dynamic tasks are triggered by actions and the static tasks have precedence. One of

company head, or whatever, you may bebetter able to find your balancing point.One of the problems is trusting the peo-ple to whom you hand off your pet proj-ects to will do the “right thing” withthem.

Finally, we discussed moving from anad-hoc group to a more procedurallybased group, formalizing processes bydocumenting not only how things work,but also why the decisions were made.Pairing people, one to explain and oneto write, can work well. Starting withsomething like script is a good start.Having a cheat-sheet or template can bevery helpful. But you have to practicewhat you preach; document things your-self so your people will. Also, make doc-umenting a requirement for theperformance review.

SYSADMIN BOOK OF KNOWLEDGE PROJECT

Moderators: Geoff Halprin and RobKolstad

Summarized by Rob Kolstad

Twelve people spent the day planningout the next phase of the System Admin-istration Book of Knowledge (BoK)project that Halprin started a few yearsago.

Halprin and Kolstad opened the work-shop with over three hours of presenta-tions that we have independently beendelivering to audiences over the pastyear. The presentations introduced andmotivate the Sysadmin BoK.

The Sysadmin BoK is intended to nameall of the items that a system administra-tor might encounter in his or her work.It goes little further than naming theitems – it is not a tutorial or “best prac-tices” document. In its barest form, theBoK will end up being a list of 2,000 orso line items (e.g., “backups,”“securitypolicy for firewall,” etc.).

When annotated with a paragraph ortwo for each of the line items, the BoKbecomes:

92 Vol. 27, No. 1 ;login:

■ A weighty tome to impress thosewho wonder what sysadmins do fora living

■ The basis for curricula – univer-sity/HS, training, advancement,individual career planning (theindividual point of view)

■ The basis for creating a benchmarkfor corporate IT/admin maturity(corporate/organizational point ofview)

■ The basis for creating best-practicedocuments (refining the knowl-edge)

In combination with the above, the BoKforms the foundation of system admin-istration as a real profession. It is onlythe beginning, but as a strong founda-tion, we believe it is essential.

Currently, the BoK has about 1500 lineitems created by a core team of a dozensysadmins and occasionally reviewed bya total of just over 100. The line itemshave been categorized into a 73 x 44matrix whose rows are general topics ofsysadmin and whose columns are spe-cific properties of those topics (e.g.,“security” and “mobility”). Printing thelist in 9-point type, two columns, smallmargins yields 19 pages of line items.

Discussion over the remainder of theday yielded these action items:

■ Combine some of the rows andcolumns to reduce the number ofelements in the matrix.

■ Add new rows and columns, as con-tributed over the last few months byreviewers.

■ Fill in another 500 or so line items.■ “Factor out” common elements that

appear throughout a column (e.g.,common security elements).

■ Write a paragraph or two for eachline item.

■ Find a way to present the matrix inlinear form (i.e., in a book).

Then we can proceed with the subse-quent BoK projects:

■ Capability maturity model■ Encyclopedia

The project particularly needs reviewerswith a strong background in administer-ing sites with Windows, both on serversand desktops. If you would like to con-tribute, please contact [email protected].

The project Web site is http://ace.delos.com/taxongate; trivial registration isrequired so that your contributions canbe tracked.

AFS

Derrick “Dana” Brashear, CMU; TedMcCabe, MIT; OpenAFS Elders; andEsther Filderman, PSC.

Summarized by Garry Zecheiss

Twenty people interested in AFS,OpenAFS, and Arla participated in theAFS Workshop, either by giving a shortpresentation or by suggesting topics andcontributing to their discussion.

Derrick Brashear gave an update on thestatus of OpenAFS. New ports are avail-able, including MacOS X and Solaris 9.There are many new features, includingsome support for AFSDB resourcerecords, dynroot support, and a newbuild system using autoconf, as well assome bug fixes, including better RX tun-ing. Long-awaited features, such as dis-connected operations and Kerberos 5support, are coming soon.

Love Hornquist-Astrand gave an updateon Arla. New features include more sup-port for MacOS X and FreeBSD andsupport for incremental file caching. Alot of RX work is being done, includingGSSAPI/SPNEGO support and theremoval of unwanted features. Love andMagnus Ahltorp have been workinghalf-time on Arla but the funding forthis stopped at the end of 2001.

General discussion covered topics suchas Arla-AFS compatibility; cache man-

Page 20: LISA '01 cover · cfengine. It can be viewed as a mix of dynamic and static schedules where the dynamic tasks are triggered by actions and the static tasks have precedence. One of

ager issues, including using memcache;backups – what are people using andhow do they cope with the Transarcbackup system; migration to Kerberos;AFS infrastructure tools; proactive AFSadministration; performance tuning;and getting funding support and public-ity for OpenAFS.

Details about the workshop are availablefrom the AFS workshop Web site athttp://www.psc.edu/~ecf/afs-workshop/.

TEACHING SYSTEM ADMINISTRATION

Moderators: John Sechrest, PEAK Inc.,and Curt Freeland, University of NotreDame

Summarized by Tim Smith

The third annual Teaching SystemAdministration workshop consisted offour sections. The first section focusedon learning objectives for a systemadministration course. These objectivesranged from basic system administra-tion tasks such as creating and managinguser accounts to more advanced issuessuch as needs assessment.

The second morning session focused onassignments for a system administrationcourse. The discussion included howquizzes, tests, labs, and projects could bestructured. During the discussion theparticipants who had already taught asystem administration course relatedwhat they had done in their course andhow it worked. The participants brokeup into groups to design a lab for one ofthe learning objectives discussed in thefirst session. Presentations of the labsconcluded the morning session. Duringthe lunch break participants discussedhow to evaluate success in a systemadministration course.

The first afternoon session began with agroup summary of the different discus-sions held during the lunch break. Thisdiscussion lead into the topic of examquestions. Different types of exams,from essay to multiple-answer and

93February 2002 ;login: LISA 2001 ●

C

ON

FERE

NC

ERE

PORT

Strue/false, were considered based on theability to evaluate what a student hasreally learned and how easy it is tograde. Based on this discussion the par-ticipants broke up into their smallgroups from the morning session tocome up with three exam questions.Most of the questions produced by thedifferent groups were short answer, witha few multiple-answer questions andessays.

The final session of the workshop cen-tered on large and real-life projects. Thediscussion was about how to bring realprojects into the class without inconve-niencing a business owner while stillallowing students to apply what theywere learning in class. The discussionshifted to the types of projects that hadbeen assigned by the participants whohad taught a system administrationcourse or completed projects in a similarnetworking course.

The workshop concluded with a discus-sion of tools, primarily ones that wouldmake grading easier and allow materialsto be shared between individuals teach-ing system administration courses.Ongoing discussions about this issuecontinue on the sysadmin educationmailing list at [email protected].

ADVANCED TOPICS

Moderator: Adam Moskowitz

Summarized by Josh Simon

The Advanced Topics Workshop wasably hosted and moderated once againby Adam Moskowitz and co-piloted byRob Kolstad. Introductions by the 26attendees generated interesting ques-tions and topics for discussions: randomopinions, the Undo command for sysad-mins, hot tools, and surprises from thepast year.

Random Opinions

People are indeed using SANs and NAS,since they’re well suited to specific prob-lems (such as archiving, Fortune 1000companies, and so on). However, they

are not being used for general file serv-ices, mainly because the FibreChannelimplementation is too expensive forgeneral use.

We also discussed the centralization/decentralization pendulum, whichseems to be moving back toward cen-tralization. Perhaps condensing is a bet-ter term, since places are condensinglocations for their hardware and person-nel but still keeping some geographicdistance between them. Centralizingadministrative functions is differentfrom actual physical centralization, since(to use the SAN/NAS model), usersdon’t care if the disk is local or acrossthe continent as long as the performanceis unaffected.

We’re moving toward more of an ASPmodel within a given environment, be itcompany or infrastructure. The ASPmodel works well between divisionswithin an organization but not as wellbetween different organizations, prima-rily due to trust issues.

The events of September 11th caused ashift in the thinking of some of thetight-fisted financial staff. They nowrealize how integral computing is tobusiness, so collocation and backups arenow more important.

The next major topic was mobility.Without mobility today’s commonplacehigh-speed network infrastructures andreliable file servers make a lot of systemadministration fairly easy; workstationscan be built from images or automatedinstallation processes, and all mutabledata lives on centralized file servers,where it’s easy to backup and manage.

But mobility changes all that. Mutabledata has to be local to the endpoint (e.g.,laptop); we can’t expect network con-nectivity to be high-speed, and we haveto be able to deal with connections overinsecure networks. We have to deal witha host of security issues, find new ways

Page 21: LISA '01 cover · cfengine. It can be viewed as a mix of dynamic and static schedules where the dynamic tasks are triggered by actions and the static tasks have precedence. One of

of ensuring data availability, and be ableto provide the needed services of variouslevels of network quality.

Mobility is becoming increasinglyimportant – there are now many organi-zations where most endpoints aremobile platforms. But IT infrastructureshave not yet caught up to this changingreality. To deal with this we will have toabandon our traditional (and previouslysuccessful) modes of thinking and usetechnologies that involve disconnectedoperation, mobile IP, synchronization,transparent data encryption, and so on.

Wireless computing has changed ourbehavior; 70% of us in the ATW are onlaptops. Our expectations seem to bethat we’re approaching ubiquitous com-puting; of those using laptops, about 2/3use them to access remote services (mail,Web, files) and 1/3 use them as the cen-tralized storage point. This leads to theintrusion of mini-environments intoyour own macro-environment. Laptopscan move from administrative domainto administrative domain and pick upand distribute viruses and whatnot inthe process. Managing and keeping themfrom screwing up your environment is ahard problem.

RECOVERY-ORIENTED COMPUTING

Recovery-Oriented Computing (ROC) istargeted to services. A PowerPoint pres-entation is available, along with infor-mation at http://www.cs.berkeley.edu/~pattrsn/.

The goals are ACME – availability,change, maintainability, and evolution-ary growth – instead of performance(which is what we’ve looked for in thepast 15 years). We’re not doing that well.

One of our needs is not just to get realdata to improve reliability but to meas-ure reliability and availability. Makingthe system administration tasks have anUndo function may be helpful. Thinkabout the three Rs: rewind (go back in

94 Vol. 27, No. 1 ;login:

time), repair (fix error), and redo (moveforward again). We’re looking to recoverat the service level, not just at the server(hardware or component) level.

■ Predictability – Having predictablerecovery would be a huge improve-ment even now. Most recoveryplans (or even risk mitigation) ispure guesswork now, based onexperience and trial and error.Change control and change man-agement need to be more formaland actually predictive of detaileddetermination.

■ Avoidability – Can you avoid theproblem to reduce the recoverytime? If you can avoid the problemthen the need for recovery time isless. This is reasonably importantand very hard.

■ Repeatability – Making tasks easilyrepeatable will help reduce com-plexity and can lead to increasedavoidability and thus increased reli-ability.

■ Risk mitigation – A lot of thechanges we make at one time – onechange – affect multiple machines(such as servers, routers, switches,firewalls, and so on). Rollbackwithin any one system is good, butwe need to have rollback in all ofthem. The problem becomes sys-tem-specific; is it a GUI or CLI?

■ Tools – They’re trying to reduce theMTTR in the MTTR/MTTF equa-tion. This project is more aboutbuilding recovery-from-something-that-has-happened than making-the-problem-less-likely-to-occur.

Right now the thought is to build a sam-ple (prototype) email system as a start-ing point.

What about security breaches (intrusiondetection)? Something similar can bedone; this kind of technology would begood. You could roll back to before theintrusion, install the filter or preventa-

tive mechanism, and then roll the goodstuff back in again.

Simply changing (fixing, simplifying,etc.) the interface is insufficient. Workdoes need to be done on SA recoveryinterfaces but this is beyond the scope ofthe ROC project.

Hot Tools in Use Today or ComingSoon

Next we discussed the new tools, tech-nologies, ideas, or paradigm we’re inves-tigating or using. The list included newIP telephony products; tricks for SSHand CVS; wireless networking; integra-tion and aggregation of alarm, monitor-ing, and administrative functionalitywith automation; reducing informationreplication; load balancing; anomalydetection; miniaturization; mirroringnetwork storage for high-speed failover;VMware; MacOS X; Java; and Perl 6. Thelist also included business problems asopposed to technology problems.

There was a side discussion about pro-gramming languages. Some people likeJava, others like C#. Java is the newCOBOL in that it’s the new business lan-guage but not a system language. Somedebate ensued, with no conclusion,about whether to teach C, C++, Java, oreven Scheme first.

Surprises from the Past Year

Several people mentioned surprisesthey’d had in the past year. This listincludes Cygwin, the PC Weasel, thedearth of middlemen in the DSL/POP/ISP markets, and the number of peoplerunning wireless networks without anysecurity.