Danss - Theory of SysAdmin

Post on 20-Jul-2016

262 views 1 download

Transcript of Danss - Theory of SysAdmin

Theory of System Administration

DANSS SeminarFeb 23rd, 2003Elliot Jaffe

Danss - Theory of SysAdmin 2Feb. 23, 2003

Outline

What is System Administration Problems in System Administration Theory overview Results Research directions

Danss - Theory of SysAdmin 3Feb. 23, 2003

What is System Administration?

What do you think?

Danss - Theory of SysAdmin 4Feb. 23, 2003

What is System Administration

In computer technology, a set of functions that provides support services, ensures reliable operations, promotes efficient use of the system, and ensures that prescribed service-quality objectives are met.

Synonym system management.US Federal Standard 1037C

Danss - Theory of SysAdmin 5Feb. 23, 2003

System Administration is

The function that provides:

Reliability – Stable, consistent service

Efficiency – Performance

Predictability – Service Level Agreement

Danss - Theory of SysAdmin 6Feb. 23, 2003

CS HUJI System Administration

Infrastructure Operating Systems Networking Account Administration Software Licensing, Installation and Support Education

Danss - Theory of SysAdmin 7Feb. 23, 2003

What you don’t see

Budgets Cost Benefit Analysis Vendor Selection Service Contracts Long term planning Policy creation

Danss - Theory of SysAdmin 8Feb. 23, 2003

Problems in Sys Admin

Strategic

Tactical

Danss - Theory of SysAdmin 9Feb. 23, 2003

Strategic Problems

Economic costs/benefit analysisHow much disk space should be purchased in

the next year?Should we buy a one new router, or do we

need a fail-over pair? If we get %25 additional students, what

resources will we need?

Danss - Theory of SysAdmin 10Feb. 23, 2003

Strategic Problems #2

What is the right level of disk space quotas?

Should we use a VLAN to localize network traffic?

Danss - Theory of SysAdmin 11Feb. 23, 2003

Tactical Problems

What is the best way to maintain multiple systems?How do we apply patches?How should we rollout an OS change?How do we support multiple configurations?How many configurations should we support?How do we use version control part of system

administration?

Danss - Theory of SysAdmin 12Feb. 23, 2003

A complete theory should enable Policy determination and evaluation

Strategic decisions about resource usage and allocation

Interactions between users and system for resources Productivity considerations (economics of the system)

Empirical verification of strategies and policies Efficiency of policy and its implementation Efficiency of the system in doing its job

Danss - Theory of SysAdmin 13Feb. 23, 2003

Theory of System Administration

A group of computers is an evolving, stochastic system viewable at multiple

levels of detail.

Danss - Theory of SysAdmin 14Feb. 23, 2003

Configuration Space

The memory state of the computer The set of bits that define the computer

state.

Example:The state of the bits in primary memory and

on secondary media (disks)

Danss - Theory of SysAdmin 15Feb. 23, 2003

Time

Time is a discrete value. For averaging purposes, we allow it to take on

real values.

Example: The system clock is discrete, having values as a

multiple of the clock speed Tc. t=0, Tc, 2Tc,…,nTc

Danss - Theory of SysAdmin 16Feb. 23, 2003

Configuration

A pattern of values associated with each point on the configuration space.

Example:The state of all bits in main memory at time t.This pattern changes over time.

Danss - Theory of SysAdmin 17Feb. 23, 2003

Averaging

Over time scales much larger than Tc, the average properties of the system can be treated as a continuum approximation, i.e. as real functions of time.

Example:The number of non-zero bits at any real value

of time.

Danss - Theory of SysAdmin 18Feb. 23, 2003

Scales

Transition from low-level to high-level

Group objects together to form new objects

Refer to state of object over time

Level Example6 LANS5 Users, VMs4 Files3 int, float, char2 bytes, words1 bits

Danss - Theory of SysAdmin 19Feb. 23, 2003

Closed Dynamical Systems

A closed dynamical system consists of a configuration space, an initial configuration and a rule for subsequent time development

Closed dynamical systems are deterministic Example:

A standalone computer without any external input is a closed dynamical system

Danss - Theory of SysAdmin 20Feb. 23, 2003

Interactions

An interaction between two systems is an endomorphism on the combined systems such that both systems determine the time developments of one another.

Example:Two standalone computers connected via a

network and synchronizing system times.

Danss - Theory of SysAdmin 21Feb. 23, 2003

Environment

An ensemble of mutually interacting systems.

Example:A user interacting with a computer.People are not standalone!

Danss - Theory of SysAdmin 22Feb. 23, 2003

Open Dynamical System

Projection of an ensemble of interacting systems onto the state of a given system.

The configuration state of an open system is unpredictable over any interval dt ~ Tc.

Does this mean that all is lost?

Danss - Theory of SysAdmin 23Feb. 23, 2003

Stability

Assume that there exists some time scale on which it is possible to predict the average state of the systems in question.

We are not interested in managing systems which cannot achieve a minimal level of stability, since these system cannot perform any reliable function.

Danss - Theory of SysAdmin 24Feb. 23, 2003

Multiple Time Scales

Short term: Tc the computer clock

Medium term: human time > 107 Tc

Long term: months and years > 107 human time

Danss - Theory of SysAdmin 25Feb. 23, 2003

Components of System State

The state of a system at any given time is composed of a slowly varying local average and a rapidly fluctuating stochastic remainder.

Are these systems stable?

Time

State

Danss - Theory of SysAdmin 26Feb. 23, 2003

Tasks

A task is a representation of an autonomous process executed on related sets of state.

A task is closed if after execution, it returns the system to the original state.

A task is open if after execution, it has changed the overall system state.

Danss - Theory of SysAdmin 27Feb. 23, 2003

Maintenance Tasks

A maintenance tasks is a task which reduces the total rate of change of the average configuration state.

Example:Deletion of accumulated garbage

Danss - Theory of SysAdmin 28Feb. 23, 2003

Policy

A policy is an average specification of equivalent system behaviors.

A set of system states that are equivalent over the given time period.

A policy is neither good nor bad. It does not necessarily lead to stability or chaos.

Danss - Theory of SysAdmin 29Feb. 23, 2003

Policy - Examples

Users are restricted to a known quota of file system space.

All computers must run Microsoft Office. Only port 80 will be open on network

servers. SSH will be used for all remote computer

access.

Danss - Theory of SysAdmin 30Feb. 23, 2003

Convergence

A convergent average policy is one whose tasks result in an equivalent configuration for all sufficiently large time scales.

A convergent average policy is one whose average behavior in time ends in a fixed average state between two sufficiently different time values.

Danss - Theory of SysAdmin 31Feb. 23, 2003

Convergence - Example

Deleting temporary files on a regular basis is a convergent policy since it returns the system to a known state (i.e. a given amount of free file system space).

Danss - Theory of SysAdmin 32Feb. 23, 2003

Persistent State

A persistent state is a configuration for which the probability of returning to an equivalent configuration at a later time is 1.

Persistence is reflected in the property that the rate of change of the average state is much slower than the rate of change of fast moving variations.

Danss - Theory of SysAdmin 33Feb. 23, 2003

Persistent States

The fast variations extend over several complete cycles before any appreciable change in the average is seem.

Time

State

Danss - Theory of SysAdmin 34Feb. 23, 2003

Theorem

In an open system, a policy specifies a class of equivalent persistent states if and only if the policy exhibits average convergence.

You can maintain the state of the system if and only if your policy consistently returns the system to a similar state. i.e. the average resource usage is constant over the policies time scale.

Danss - Theory of SysAdmin 35Feb. 23, 2003

Implications

System Administration is the development, specification and implementation of environments and maintenance tasks with the goal of creating a persistent average state.

Danss - Theory of SysAdmin 36Feb. 23, 2003

Strategy

Type IStochastic models

Type IISemantic models

Danss - Theory of SysAdmin 37Feb. 23, 2003

Type I - Stochastic models

Analyze what is happening on multiple time scales Describe locally averaged states Model known boundary conditions

Empirical measurements of existing systems. Predictive modeling of systems based on

measurements.

Danss - Theory of SysAdmin 38Feb. 23, 2003

Problems with Stochastic Models Statistics measurements are rare No experimental repeatability Conditions of measurements are

constantly changing Absolute definitions are impossible People cannot be described by a small

number of characteristics

Danss - Theory of SysAdmin 39Feb. 23, 2003

Stochastic modeling -- Uses

Strategic planning Do we need to buy more file servers?

Problem identificationWhy is user X using 300% of the normal disk

quota?Why is computer Y rebooting twice a week

when all other systems are stable for months?

Danss - Theory of SysAdmin 40Feb. 23, 2003

Strategic models

Analyze what might be changed in a system.

Formulate as a game of strategy Achieve larger goals than just maintaining

a persistent state.

Danss - Theory of SysAdmin 41Feb. 23, 2003

Strategic Goals

Sys Admin: Keep the system alive and running so that users can perform a maximum amount of work

Benign User: produce useful work using the system. (consumes resources)

Malicious User: Maximize control of system resources

Danss - Theory of SysAdmin 42Feb. 23, 2003

Strategic tools

Game TheoryContests between System Administrator and

malicious users.System Downtime: Mean time to repair /

Mean time before failure Minimize MTTR or maximize MTBF?

Levels of monitoring: At what point does the cost of monitoring overwhelm the benefit?

Danss - Theory of SysAdmin 43Feb. 23, 2003

Current research

Recovering File space System upgrades Quota systems

Danss - Theory of SysAdmin 44Feb. 23, 2003

Recovering File Space

How do you clean unused files?Competition between users and adminsTrade off between

having enough space to operate Users recreating temp files that were deleted Users “grabbing” space for later use

Danss - Theory of SysAdmin 45Feb. 23, 2003

Patch Application

How do you apply changes to a distributed system?Divergence

Convergence

Congruence

Danss - Theory of SysAdmin 46Feb. 23, 2003

Quota application

What is the correct way to set file system quotas?By categoryDynamically assign users to groupsSet group to lowest maximal value

Danss - Theory of SysAdmin 47Feb. 23, 2003

Bibliography

Burgess, M. 2003. On the theory of System Administration, Journal of the ACM.

S. Traugott, L. Brown 2002. Why Order Matters: Turing Equivalence in Automated Systems Administration, Lisa 2002

M. Gilfix, 2002. Holistic Quota Management: The Natural path to a better, more efficient quota system, Lisa 2002