Lecture Two Data Centre’s Government & Maintenance Work & People Organisation.

32
Lecture Two Data Centre’s Government & Maintenance Work & People Organisation

Transcript of Lecture Two Data Centre’s Government & Maintenance Work & People Organisation.

Page 1: Lecture Two Data Centre’s Government & Maintenance Work & People Organisation.

Lecture TwoData Centre’s Government & Maintenance

Work & People Organisation

Page 2: Lecture Two Data Centre’s Government & Maintenance Work & People Organisation.

Changes Governance (1/5)

… then you better start swimmin'or you'll sink like a stone

for the times they are a-changin‘ …

• A Data Centre is a living thing, experiencing continual changes• A good Data Center’s Government requires to forecast the changes, to

analyze their impact, to plan and control the corrective and complying actions, to verify the results …

… then you better start swimmin‘ …

Page 3: Lecture Two Data Centre’s Government & Maintenance Work & People Organisation.

Changes Governance (2/5)

“… Dad, my PC doesn’t respond ! …”

• 80% of service interruption is caused by operator error or poor change control (Gartner)

What did you change?

Page 4: Lecture Two Data Centre’s Government & Maintenance Work & People Organisation.

Changes Governance (3/5)

• Changes may concern all the Data Centre components (building, hardware, software, people, …) and may be originated by internal or external reasons• As an early summary classification we may distinguish between

“ordinary” and “extraordinary” changes• A main difference between these categories lies in the approach to

face them: while the ordinary changes are generally managed through well-defined and consolidated procedures, for the extraordinary changes must often be established an “ad hoc” project

Page 5: Lecture Two Data Centre’s Government & Maintenance Work & People Organisation.

Changes Governance (4/5)

• Common causes for “ordinary” changes are:• Users’ requests• Legislation requirements• Technological innovations• Actions for budget control• Accidents and mistakes

• Ordinary changes are very frequent (hourly/daily) and their life-cycle is generally medium-short (hours to few weeks). They impact limited components of the Data Centre. Their management involves few resources with a medium-low effort

Page 6: Lecture Two Data Centre’s Government & Maintenance Work & People Organisation.

Changes Governance (5/5)

• Examples of causes for “extraordinary” changes are:• Great technical or regulatory transformations• Wide company reorganizations• Site relocations and consolidations• Big and unpredicted accidents (“disasters”)

• Extraordinary changes are sporadic and their life-cycle is certainly long (many months to years). Their impact usually crosses all the Data Centre components. Their management involves many resources and requires a huge effort. These resources are generally organized as a specific project-team, with a dedicated leader

Page 7: Lecture Two Data Centre’s Government & Maintenance Work & People Organisation.

Organising the Work (1/11)

CHANGE MGMT• Design and plan the required

changes• Apply the changes

PROBLEM MGMT• Analyse the problems• Identify the repairing

changes

DEMAND MGMT• Collect the requests (from

users, market, legislation)• Assign the proper priorities

Page 8: Lecture Two Data Centre’s Government & Maintenance Work & People Organisation.

Organising the Work (2/11)

• Service continuity must be always protected …• … so changes must be tested in a similar, but separated

“environment”.

Common Data Centre environments:

Development Test Trial Production

Page 9: Lecture Two Data Centre’s Government & Maintenance Work & People Organisation.

Organising the Work (3/11)

Development Environment• It’s a “laboratory” where new changes are designed and developed.• This environment is generally used for software changes and – more usually – for

application software changes. However a development environment may be used for system software changes as well. It’s extremely rare to use it for hardware changes.• The environment is geared with the tools used by the technicians to produce and

modify the software. It usually contains a library (“Repository”) where all the versions of the software are stored: the old, current and underdeveloped ones.• In the smaller Data Centers the development environment is often joined with

the test environment

Page 10: Lecture Two Data Centre’s Government & Maintenance Work & People Organisation.

Organising the Work (4/11)

Test Environment (1/3)• This environment is used to test the changes built in the development

environment• The changes must be tested to verify that:• They fit the purposes they were designed for.• They do not generate problems.

• To test the changes, the changed components (usually software) must work in similar conditions as they work in the production environment: so the test environment is required to be “similar” to the production one

Page 11: Lecture Two Data Centre’s Government & Maintenance Work & People Organisation.

Organising the Work (5/11)

Test Environment (2/3)• The test environment is usually just similar (not “equal”) to the

production one for economical reasons. To duplicate the production environment only to test the changes should be extremely expensive and isn’t usually necessary. For example to test a new software for a bank’s cash-dispenser network with 1.500 devices, it’s enough to set up a test network with 4-5 devices (better if including all the different used models of the production network).• Isn’t rare to find different “parallel” environments to test different

changes at the same time.

Page 12: Lecture Two Data Centre’s Government & Maintenance Work & People Organisation.

Organising the Work (6/11)

Test Environment (3/3)• The data are one of the main topics to be studied during a test

environment design• As a start we could think that the best solution is to run the test with a

perfect copy of the production data. However this choice is subjected to three shortcomings:

1. Cost: often the amount of the production data is excessive for test purpose2. Security: some production data are confidential and must not be accessed by

the technicians running the test3. Reliability: sometimes the set of the true data is a “subset” of all the possible

data. So some theoretical possible occurrences are not tested

Page 13: Lecture Two Data Centre’s Government & Maintenance Work & People Organisation.

Organising the Work (7/11)

Trial Environment (1/2)• The trial is a sort of “test PLUS” environment. Its purpose is a “dry run” of the

changes, that’s the last complete test of the system before its delivery in the production environment.• The main characteristic of a trial vs. a test environment is its stronger affinity

to the production environment: it’s a requirement to guarantee the test effectiveness. As an example, “stronger affinity” means a latest copy of the production environment (a test environment may have been generated not much recently). Furthermore in a trial environment may be present characteristics missing in a test one: an example is the presence of security systems usually “disabled” in the test environment, with the aim to speed the test runs.

Page 14: Lecture Two Data Centre’s Government & Maintenance Work & People Organisation.

Organising the Work (8/11)

Trial Environment (2/2)• Another usual characteristic of a trial environment (not always

present in a test environment) is the capacity to simulate the production “workload”. Specific tools are available that can stress the systems generating “transaction flows” comparable to the true workload (from the volumes and from the statistical distribution as well points of view) • In the smaller Data Centers the trial environment is often missing and

the last run before the delivery is usually done in the test environment

Page 15: Lecture Two Data Centre’s Government & Maintenance Work & People Organisation.

Organising the Work (9/11)

Production Environment• It’s the environment where the true services are delivered to the true users• Its main characteristic must be a perfect isolation from the other environments

(development, test and trial), if present. Usually indeed the other environments are much less protected and reliable and if the isolation is not enough confident the production environment may be somehow effected by the problems occurring elsewhere• The best isolation is achieved using two completely distinct Data Centers: one for

production and the second one for development, test and trial together. However less expensive and anyway working solutions may be designed using distinct hardware in the same site, or even distinct virtual environments on the same hardware

Page 16: Lecture Two Data Centre’s Government & Maintenance Work & People Organisation.

Organising the Work (10/11)

The Software Lifecycle (1/2)• The “Lifecycle” of the software is characterized by some typical phases:

1. Design2. Development3. Test4. Delivery and possible deploy5. Errors correction and functional changes6. Disuse

• Usually a software is delivered in different “releases” and the phases follow cyclically, release after realease

Page 17: Lecture Two Data Centre’s Government & Maintenance Work & People Organisation.

Organising the Work (11/11)

The Software Lifecycle (2/2)• Replacing the actual release of a software with the new one, is often important to choose between two

approaches: “phased” vs. “big-bang” delivery.

• Consider:• Release preparation time• Concurrent changes• Interactions with other internal/external systems• Test complexity• Is the date your own choice? (… hardly ever !)

• Phased approach is generally less “painful” but requires more work

PREP-1

PREP-2

PREP

Page 18: Lecture Two Data Centre’s Government & Maintenance Work & People Organisation.

Organising the People (1/8)

• The People teams working in a Data Centre are typically organized with the following structure:

Management

Staff

Applications Systems Operations

Page 19: Lecture Two Data Centre’s Government & Maintenance Work & People Organisation.

Organising the People (2/8)

Applications• They deal with the lifecycle of the Application Software• It’s usually possible to distinguish two kinds of figures:

• Analysts: who analyze the users requests and design the general characteristics of the software to be built. They choose the software functionalities and its technical general architecture as well (the tools to be used, the structure of the modules, etc.)

• Programmers: who, following the general design depicted by the analysts, “write the code”

• Usually, in a medium-great organization, the Applications “division” is structured in two or more “departments”, one for each “Applications Family” (as an example, for a bank, it’s usual to find the departments “Accounts”, “Financial”, “Loans”, “Web-banking”, etc.)

Page 20: Lecture Two Data Centre’s Government & Maintenance Work & People Organisation.

Organising the People (3/8)

Systems (1/2)• The people working in this division deals with the “Systems”, i.e. hardware, system

software, network. In a medium-great organization the division is usually structured in three “departments”:

• “Software” and “Hardware” usually deal with “not-network” SW & HW (i.e. computers, storage, etc.), while “Network” deals with both HW & SW for network. That’s because network components are each other more tightly linked than not-network ones.

NetworkSoftware Hardware

Page 21: Lecture Two Data Centre’s Government & Maintenance Work & People Organisation.

Organising the People (4/8)

Systems (2/2)• Each department, mainly in big organizations, may be structured in smaller high-

specialized teams (for example Software people may be organized in teams dealing with operating systems, data base systems, middleware, etc.)• In greater organizations it’s usually present a team dedicated to “Peripheral

Systems”. Sometimes it’s located inside the Network department, sometimes not. It’s dealing with systems out of the Data Centre (i.e. personal computers, “branch servers”, etc.)• For the Systems specialists too – just like for the Applications ones – it’s usually

possible to distinguish between System Analysts (dealing with the general structure of the systems they manage) and System Programmers (with more technical and operational skills)

Page 22: Lecture Two Data Centre’s Government & Maintenance Work & People Organisation.

Organising the People (5/8)

Operations (1/2)• While Applications and Systems divisions deal with design, development

and maintenance of the Data Centre components, Operations division is responsible for its day-by-day functioning.• The Operation division is responsible for the “Service Levels” negotiated

with the users, in terms of service time, performance, problem resolution times, etc.• Because of this responsibilities, the Operation division must be the “only

and absolute owner” of the production environment. No other else can apply any change to production components without the Operation division authorization.

Page 23: Lecture Two Data Centre’s Government & Maintenance Work & People Organisation.

Organising the People (6/8)

Operations (2/2)• The Operations division too, in medium-great organizations, is often

structured in smaller teams. Usually:• Computer Room: dealing with systems and applications starting, stopping and

properly working. This team is usually working 7H24• Storage: dealing with data maintenance and data recovery• NOC: or “Network Operation Centre” dealing with network components

functioning• Help Desk: responsible for the communications between the users and the

Data Centre. The Help Desk phone number must be the only one dialed by the users to notify malfunctions or other problems

Page 24: Lecture Two Data Centre’s Government & Maintenance Work & People Organisation.

Organising the People (7/8)

Staff (1/2)• The “Staff” is not always present (but it is, for sure, in the grater

organizations) and represents one or more teams with miscellaneous tasks. These tasks have two characteristics:• they concern the whole Data Centre (i.e. they’re crossing more or all the

components or functions)• The Data Centre Management must have full and direct view and control over

them (and that’s the reason why the Staff teams are directly subordinated to the management)

• Each Staff team is generally very thin, composed by two or three professionals extremely skilled in the matter they deal with

Page 25: Lecture Two Data Centre’s Government & Maintenance Work & People Organisation.

Organising the People (8/8)

Staff (2/2)• Some Staff teams usually (even not always) present are:

• Security: dealing with the physical and logical Security Systems, users authentication and authorizations, etc. When present, this team usually deals with Disaster Recovery systems and procedures as well

• Procurement: dealing with all the procurement life-cycle, including the costs budget preparation, the negotiation with the suppliers (sometimes by means of specific invitations to tender), the contracts stipulation and control, the payments supervision, etc.

• Standards and Documentation: is a team responsible to set, maintain and document all the “working rules” about the Data Centre functioning. For example: what are the responsibilities of each team in each division, what technical architectures and tools are eligible as “Standard”, what are the “naming conventions” for all the Data Centre components, etc.

Page 26: Lecture Two Data Centre’s Government & Maintenance Work & People Organisation.

Data Centres actually …

… a few numbers about environments …• The case of a medium-great Italian P.A. Data Centre

… and an example of extraordinary project …• 2 Firms merge: Application unification and Site consolidation

Page 27: Lecture Two Data Centre’s Government & Maintenance Work & People Organisation.

Environments in a medium-great Italian P.A. Data Centre (1/3)• The Site:

Page 28: Lecture Two Data Centre’s Government & Maintenance Work & People Organisation.

Environments in a medium-great Italian P.A. Data Centre (2/3)• The Hardware:

Page 29: Lecture Two Data Centre’s Government & Maintenance Work & People Organisation.

Environments in a medium-great Italian P.A. Data Centre (3/3)• The Environments:

614 virtual environments

AIX Windows Linux

Mainframe

Production 246 101 1Test / Trial 192 49 1

Service (VM) 24 --- ---

Page 30: Lecture Two Data Centre’s Government & Maintenance Work & People Organisation.

2 Firms merge: Application unification and Site consolidation – (1/3)Application unification:• From 2 different Application Systems to an unified one• “Application System” unification means “Application Software” + “System

Software” unification• Usually the unified system is x% of Firm-A system + y% of Firm-B system + z

% brand-new

Site consolidation:• From 2 different Sites to an unified one• Usually the unified site is the Firm-A or the Firm-B site; very infrequent a

third brand-new site

Page 31: Lecture Two Data Centre’s Government & Maintenance Work & People Organisation.

2 Firms merge: Application unification and Site consolidation – (2/3)Strategies (PRO/CON): (A) appl.unif. & then site cons. VS (B) site cons. & then appl.unif.

(A) appl.unif. site cons. (B) site cons. appl.unif.PRO • The Unified Site sizing is exactly

equal to the sum of the starting Sites at the end of Appl. Unification

• The Consolidation savings can be achieved a few months after the start up

• The Appl. Unification process is easier if carried out in one single Site

• People integration is immediately promoted and the whole process will run faster

CON • The Consolidation savings can be achieved only after the Appl. Unification, loosing this benefit for many months

• The Unified Site sizing must somehow exceed the sum of the starting Sites as is before the Appl. Unification

Page 32: Lecture Two Data Centre’s Government & Maintenance Work & People Organisation.

2 Firms merge: Application unification and Site consolidation – (3/3)Consider:• HW & SW equipment in peripheral branches (Application Unification

usually requires mass upgrade or substitution with long time consuming processes)• People education, both in the Data Centers and in peripheral

branches: the latter may require many months• Reconversion of one of the original sites as a Disaster Recovery site or

as a development/test site (or both)