Post on 20-Jul-2016
Theory of System Administration
DANSS SeminarFeb 23rd, 2003Elliot Jaffe
Danss - Theory of SysAdmin 2Feb. 23, 2003
Outline
What is System Administration Problems in System Administration Theory overview Results Research directions
Danss - Theory of SysAdmin 3Feb. 23, 2003
What is System Administration?
What do you think?
Danss - Theory of SysAdmin 4Feb. 23, 2003
What is System Administration
In computer technology, a set of functions that provides support services, ensures reliable operations, promotes efficient use of the system, and ensures that prescribed service-quality objectives are met.
Synonym system management.US Federal Standard 1037C
Danss - Theory of SysAdmin 5Feb. 23, 2003
System Administration is
The function that provides:
Reliability – Stable, consistent service
Efficiency – Performance
Predictability – Service Level Agreement
Danss - Theory of SysAdmin 6Feb. 23, 2003
CS HUJI System Administration
Infrastructure Operating Systems Networking Account Administration Software Licensing, Installation and Support Education
Danss - Theory of SysAdmin 7Feb. 23, 2003
What you don’t see
Budgets Cost Benefit Analysis Vendor Selection Service Contracts Long term planning Policy creation
Danss - Theory of SysAdmin 8Feb. 23, 2003
Problems in Sys Admin
Strategic
Tactical
Danss - Theory of SysAdmin 9Feb. 23, 2003
Strategic Problems
Economic costs/benefit analysisHow much disk space should be purchased in
the next year?Should we buy a one new router, or do we
need a fail-over pair? If we get %25 additional students, what
resources will we need?
Danss - Theory of SysAdmin 10Feb. 23, 2003
Strategic Problems #2
What is the right level of disk space quotas?
Should we use a VLAN to localize network traffic?
Danss - Theory of SysAdmin 11Feb. 23, 2003
Tactical Problems
What is the best way to maintain multiple systems?How do we apply patches?How should we rollout an OS change?How do we support multiple configurations?How many configurations should we support?How do we use version control part of system
administration?
Danss - Theory of SysAdmin 12Feb. 23, 2003
A complete theory should enable Policy determination and evaluation
Strategic decisions about resource usage and allocation
Interactions between users and system for resources Productivity considerations (economics of the system)
Empirical verification of strategies and policies Efficiency of policy and its implementation Efficiency of the system in doing its job
Danss - Theory of SysAdmin 13Feb. 23, 2003
Theory of System Administration
A group of computers is an evolving, stochastic system viewable at multiple
levels of detail.
Danss - Theory of SysAdmin 14Feb. 23, 2003
Configuration Space
The memory state of the computer The set of bits that define the computer
state.
Example:The state of the bits in primary memory and
on secondary media (disks)
Danss - Theory of SysAdmin 15Feb. 23, 2003
Time
Time is a discrete value. For averaging purposes, we allow it to take on
real values.
Example: The system clock is discrete, having values as a
multiple of the clock speed Tc. t=0, Tc, 2Tc,…,nTc
Danss - Theory of SysAdmin 16Feb. 23, 2003
Configuration
A pattern of values associated with each point on the configuration space.
Example:The state of all bits in main memory at time t.This pattern changes over time.
Danss - Theory of SysAdmin 17Feb. 23, 2003
Averaging
Over time scales much larger than Tc, the average properties of the system can be treated as a continuum approximation, i.e. as real functions of time.
Example:The number of non-zero bits at any real value
of time.
Danss - Theory of SysAdmin 18Feb. 23, 2003
Scales
Transition from low-level to high-level
Group objects together to form new objects
Refer to state of object over time
Level Example6 LANS5 Users, VMs4 Files3 int, float, char2 bytes, words1 bits
Danss - Theory of SysAdmin 19Feb. 23, 2003
Closed Dynamical Systems
A closed dynamical system consists of a configuration space, an initial configuration and a rule for subsequent time development
Closed dynamical systems are deterministic Example:
A standalone computer without any external input is a closed dynamical system
Danss - Theory of SysAdmin 20Feb. 23, 2003
Interactions
An interaction between two systems is an endomorphism on the combined systems such that both systems determine the time developments of one another.
Example:Two standalone computers connected via a
network and synchronizing system times.
Danss - Theory of SysAdmin 21Feb. 23, 2003
Environment
An ensemble of mutually interacting systems.
Example:A user interacting with a computer.People are not standalone!
Danss - Theory of SysAdmin 22Feb. 23, 2003
Open Dynamical System
Projection of an ensemble of interacting systems onto the state of a given system.
The configuration state of an open system is unpredictable over any interval dt ~ Tc.
Does this mean that all is lost?
Danss - Theory of SysAdmin 23Feb. 23, 2003
Stability
Assume that there exists some time scale on which it is possible to predict the average state of the systems in question.
We are not interested in managing systems which cannot achieve a minimal level of stability, since these system cannot perform any reliable function.
Danss - Theory of SysAdmin 24Feb. 23, 2003
Multiple Time Scales
Short term: Tc the computer clock
Medium term: human time > 107 Tc
Long term: months and years > 107 human time
Danss - Theory of SysAdmin 25Feb. 23, 2003
Components of System State
The state of a system at any given time is composed of a slowly varying local average and a rapidly fluctuating stochastic remainder.
Are these systems stable?
Time
State
Danss - Theory of SysAdmin 26Feb. 23, 2003
Tasks
A task is a representation of an autonomous process executed on related sets of state.
A task is closed if after execution, it returns the system to the original state.
A task is open if after execution, it has changed the overall system state.
Danss - Theory of SysAdmin 27Feb. 23, 2003
Maintenance Tasks
A maintenance tasks is a task which reduces the total rate of change of the average configuration state.
Example:Deletion of accumulated garbage
Danss - Theory of SysAdmin 28Feb. 23, 2003
Policy
A policy is an average specification of equivalent system behaviors.
A set of system states that are equivalent over the given time period.
A policy is neither good nor bad. It does not necessarily lead to stability or chaos.
Danss - Theory of SysAdmin 29Feb. 23, 2003
Policy - Examples
Users are restricted to a known quota of file system space.
All computers must run Microsoft Office. Only port 80 will be open on network
servers. SSH will be used for all remote computer
access.
Danss - Theory of SysAdmin 30Feb. 23, 2003
Convergence
A convergent average policy is one whose tasks result in an equivalent configuration for all sufficiently large time scales.
A convergent average policy is one whose average behavior in time ends in a fixed average state between two sufficiently different time values.
Danss - Theory of SysAdmin 31Feb. 23, 2003
Convergence - Example
Deleting temporary files on a regular basis is a convergent policy since it returns the system to a known state (i.e. a given amount of free file system space).
Danss - Theory of SysAdmin 32Feb. 23, 2003
Persistent State
A persistent state is a configuration for which the probability of returning to an equivalent configuration at a later time is 1.
Persistence is reflected in the property that the rate of change of the average state is much slower than the rate of change of fast moving variations.
Danss - Theory of SysAdmin 33Feb. 23, 2003
Persistent States
The fast variations extend over several complete cycles before any appreciable change in the average is seem.
Time
State
Danss - Theory of SysAdmin 34Feb. 23, 2003
Theorem
In an open system, a policy specifies a class of equivalent persistent states if and only if the policy exhibits average convergence.
You can maintain the state of the system if and only if your policy consistently returns the system to a similar state. i.e. the average resource usage is constant over the policies time scale.
Danss - Theory of SysAdmin 35Feb. 23, 2003
Implications
System Administration is the development, specification and implementation of environments and maintenance tasks with the goal of creating a persistent average state.
Danss - Theory of SysAdmin 36Feb. 23, 2003
Strategy
Type IStochastic models
Type IISemantic models
Danss - Theory of SysAdmin 37Feb. 23, 2003
Type I - Stochastic models
Analyze what is happening on multiple time scales Describe locally averaged states Model known boundary conditions
Empirical measurements of existing systems. Predictive modeling of systems based on
measurements.
Danss - Theory of SysAdmin 38Feb. 23, 2003
Problems with Stochastic Models Statistics measurements are rare No experimental repeatability Conditions of measurements are
constantly changing Absolute definitions are impossible People cannot be described by a small
number of characteristics
Danss - Theory of SysAdmin 39Feb. 23, 2003
Stochastic modeling -- Uses
Strategic planning Do we need to buy more file servers?
Problem identificationWhy is user X using 300% of the normal disk
quota?Why is computer Y rebooting twice a week
when all other systems are stable for months?
Danss - Theory of SysAdmin 40Feb. 23, 2003
Strategic models
Analyze what might be changed in a system.
Formulate as a game of strategy Achieve larger goals than just maintaining
a persistent state.
Danss - Theory of SysAdmin 41Feb. 23, 2003
Strategic Goals
Sys Admin: Keep the system alive and running so that users can perform a maximum amount of work
Benign User: produce useful work using the system. (consumes resources)
Malicious User: Maximize control of system resources
Danss - Theory of SysAdmin 42Feb. 23, 2003
Strategic tools
Game TheoryContests between System Administrator and
malicious users.System Downtime: Mean time to repair /
Mean time before failure Minimize MTTR or maximize MTBF?
Levels of monitoring: At what point does the cost of monitoring overwhelm the benefit?
Danss - Theory of SysAdmin 43Feb. 23, 2003
Current research
Recovering File space System upgrades Quota systems
Danss - Theory of SysAdmin 44Feb. 23, 2003
Recovering File Space
How do you clean unused files?Competition between users and adminsTrade off between
having enough space to operate Users recreating temp files that were deleted Users “grabbing” space for later use
Danss - Theory of SysAdmin 45Feb. 23, 2003
Patch Application
How do you apply changes to a distributed system?Divergence
Convergence
Congruence
Danss - Theory of SysAdmin 46Feb. 23, 2003
Quota application
What is the correct way to set file system quotas?By categoryDynamically assign users to groupsSet group to lowest maximal value
Danss - Theory of SysAdmin 47Feb. 23, 2003
Bibliography
Burgess, M. 2003. On the theory of System Administration, Journal of the ACM.
S. Traugott, L. Brown 2002. Why Order Matters: Turing Equivalence in Automated Systems Administration, Lisa 2002
M. Gilfix, 2002. Holistic Quota Management: The Natural path to a better, more efficient quota system, Lisa 2002