6 service operation

73
© 2008 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Service Operation

Transcript of 6 service operation

Page 1: 6 service operation

© 2008 Hewlett-Packard Development Company, L.P.The information contained herein is subject to change without notice

Service Operation

Page 2: 6 service operation

2

Service OperationKey Concepts• Event• Service Request• Self Help

Functions• Service Desk• Technical Management• IT Operations

Management • Applications

Management

Processes• Event Management

• Incident Management

• Request Fulfillment

• Problem Management

• Access Management

• Operation Management

Page 3: 6 service operation

3

Processes• Event Management is the process that monitors all events that

occur through the IT infrastructure to allow for normal operation and also to detect and escalate exception conditions.

• Incident Management concentrates on restoring the service to users as quickly as possible, in order to minimize business impact.

• Request Fulfillment involves the management of customer or user requests that are not generated as an incident from an unexpected service delay or disruption.

• Access Management: this is the process of granting authorized users the right to use a service, while restricting access to non-authorized users.

Page 4: 6 service operation

4

Operation Management• In addition, there are several other processes that will be

executed or supported during Service Operation, but which are driven during other phases of the Service Management Lifecycle. The operational aspects of these processes will be discussed in the final part of this chapter and include:− Change Management, a major process which should be closely linked

to Configuration Management and Release Management. These topics are primarily covered in the Service Transition publication.

− Capacity and Availability Management, the operational aspects of which are covered in this publication, but which are covered in more detail in the Service Design publication.

− Financial Management, which is covered in the Service Strategy publication.

− Knowledge Management, which is covered in the Service Transition publication.

− IT Service Continuity, which is covered in the Service Design publication.

− Service Reporting and Measurement, which are covered in the Continual Service Improvement publication.

Page 5: 6 service operation

© 2008 Hewlett-Packard Development Company, L.P.The information contained herein is subject to change without notice

Event Management

Page 6: 6 service operation

6

Event Management• An event can be defined as any detectable or discernible

occurrence that has significance for the management of the IT Infrastructure or the delivery of IT service and evaluation of the impact a deviation might cause to the services.

• Events are typically notifications created by an IT service, Configuration Item (CI) or monitoring tool.

• Effective Service Operation is dependent on knowing the status of the infrastructure and detecting any deviation from normal or expected operation. This is provided by good monitoring and control systems, which are based on two types of tools:− active monitoring tools that poll key CIs to determine their status

and availability. Any exceptions will generate an alert that needs to be communicated to the appropriate tool or team for action

− passive monitoring tools that detect and correlate operational alerts or communications generated by CIs.

Page 7: 6 service operation

7

Event Management• The ability to detect events, make sense of them and determine

the appropriate control action is provided by Event Management. Event Management is therefore the basis for Operational Monitoring and Control.

• In addition, if these events are programmed to communicate operational information as well as warnings and exceptions, they can be used as a basis for automating many routine Operations Management activities, for example executing scripts on remote devices, or submitting jobs for processing, or even dynamically balancing the demand for a service across multiple devices to enhance performance.

• Event Management therefore provides the entry point for the execution of many Service Operation processes and activities.

• In addition, it provides a way of comparing actual performance and behavior against design standards and SLAs. As such, Event Management also provides a basis for Service Assurance and Reporting; and Service Improvement. This is covered in detail in the Continual Service Improvement publication.

Page 8: 6 service operation

8

Event Management - Objectives• Detect Events, make sense of them, and determine

the appropriate control action

• Event Management is the basis for Operational Monitoring and Control

Page 9: 6 service operation

9

Scope• Event Management can be applied to any aspect of

Service Management that needs to be controlled and which can be automated.

• These include:−Configuration Items:

• Some CIs will be included because they need to stay in a constant state (e.g. a switch on a network needs to stay on and Event Management tools confirm this by monitoring responses to ‘pings’).

• Some CIs will be included because their status needs to change frequently and Event Management can be used to automate this and update the CMS (e.g. the updating of a file server).

−Environmental conditions (e.g. fire and smoke detection)−Software license monitoring for usage to ensure optimum/legal

license utilization and allocation−Security (e.g. intrusion detection)−Normal activity (e.g. tracking the use of an application or the

performance of a server).

Page 10: 6 service operation

10

Event Management – Basic Concepts• Event

An alert or notification created by any IT Service, Configuration Item or monitoring tool. For example a batch job has completed. Events typically require IT Operations personnel to take actions, and often lead to Incidents being logged.

• Event Management

The Process responsible for managing Events throughout their Lifecycle.

Page 11: 6 service operation

11

Value to business• Event Management’s value to the business is generally indirect;

however, it is possible to determine the basis for its value as follows:− Event Management provides mechanisms for early detection of

incidents. In many cases it is possible for the incident to be detected and assigned to the appropriate group for action before any actual service outage occurs.

− Event Management makes it possible for some types of automated activity to be monitored by exception – thus removing the need for expensive and resource intensive real-time monitoring, while reducing downtime.

• When integrated into other Service Management processes (such as, for example, Availability or Capacity Management), Event Management can signal status changes or exceptions that allow the appropriate person or team to perform early response, thus improving the performance of the process. This, in turn, will allow the business to benefit from more effective and more efficient Service Management overall.

• Event Management provides a basis for automated operations, thus increasing efficiencies and allowing expensive human resources to be used for more innovative work, such as designing new or improved functionality or defining new ways in which the business can exploit technology for increased competitive advantage.

Page 12: 6 service operation

12

Policies, principles and basic concepts• There are many different types of events:• Events that signify regular operation

−Notification that a scheduled workload has completed

−An e-mail has reached its intended recipient

• Events that signify an exception−A user attempts to log on to an application with the incorrect

password

−A device’s CPU is above the acceptable utilization rate

• Events that signify unusual, but not exceptional, operation.−A server’s memory utilization reaches within 5% of its highest

acceptable performance level

Page 13: 6 service operation

13

Example of Event Categories• Informational: This refers to an event that does not require any action and does

not represent an exception. They are typically stored in the system or service log files and kept for a predetermined period. Informational events are typically used to check on the status of a device or service, or to confirm the successful completion of an activity. Examples of informational events include:− A user logs onto an application− A job in the batch queue completes successfully A device has come online− A transaction is completed successfully.

• Warning: A warning is an event that is generated when a service or device is approaching a threshold. Warnings are intended to notify the appropriate person, process or tool so that the situation can be checked and the appropriate action taken to prevent an exception. Warnings are not typically raised for a device failure. Examples of warnings are: − Memory utilization on a server is currently at 65% and increasing. If it reaches

75%, response times will be unacceptably long and the OLA for that department will be breached.

− The collision rate on a network has increased by 15% over the past hour.

• Exception: An exception means that a service or device is currently operating abnormally (however that has been defined). Typically, this means that an OLA and SLA have been breached and the business is being impacted. Exceptions could represent a total failure, impaired functionality or degraded performance. Examples of exceptions include: − A server is down− Response time of a standard transaction across the network has slowed to more

than 15 seconds− A segment of the network is not responding to routine requests.

Page 14: 6 service operation

14

Event ManagementLogging and Filtering

Event

Filter

Exception

Information

Warning

Page 15: 6 service operation

15

Event ManagementManaging Exceptions

Exception

Incident Managemen

t

Incident

Problem Managemen

t

Problem

Change Managemen

t

RFC

Incident/

Problem /

Change?

Page 16: 6 service operation

16

Event ManagementInformation & Warnings

InformationLog

Warning

Problem

RFC

Incident/Problem /Change?

Incident

Alert Human Intervention

Auto Response

Page 17: 6 service operation

17

Event Management - Roles• Event management roles are filled by people

in the following functions

−Service Desk

−Technical Management

−Application Management

−IT Operations Management

Page 18: 6 service operation

© 2008 Hewlett-Packard Development Company, L.P.The information contained herein is subject to change without notice

Incident Management

Page 19: 6 service operation

19

Purpose/Goals/Objectives• The primary goal of the Incident Management

process is to restore normal service operation as quickly as possible and minimize the adverse impact on business operations, thus ensuring that the best possible levels of service quality and availability are maintained. ‘Normal service operation’ is defined here as service operation within SLA limits.

Page 20: 6 service operation

20

Incident Management - Scope• Managing any disruption or potential disruption to

live IT services

• Incidents are identified−Directly by users through the Service Desk

−Through an interface from Event Management to Incident Management tools

• Reported and/or logged by technical staff

Page 21: 6 service operation

21

Incident Management – Business value• The value of Incident Management includes:

−Quicker incident resolution leads to higher availability of service

• The ability to detect and resolve incidents, which results in lower downtime to the business, which in turn means higher availability of the service. This means that the business is able to exploit the functionality of the service as designed.

−Align IT activity to business priorities• The ability to align IT activity to real-time business priorities. This is because Incident Management

includes the capability to identify business priorities and dynamically allocate resources as necessary.

−Identify potential improvements leads to improved quality• The ability to identify potential improvements to services. This happens as a result of

understanding what constitutes an incident and also from being in contact with the activities of business operational staff.

−The Service Desk can, during its handling of incidents, identify additional service or training requirements found in IT or the business.

Page 22: 6 service operation

22

Policies, principles and basic concepts• An Incident

−An unplanned interruption or reduction in the quality of an IT Service

−Any event which could affect an IT Service in the future is also an Incident

• Timescales

• Incident Models

• Major Incidents

Page 23: 6 service operation

23

Incident Management - Activities• Identification

• Logging

• Categorization

• Prioritization

• Initial diagnosis

• Escalation

• Investigation and diagnosis

• Resolution and recovery

• Closure

Page 24: 6 service operation

24

Incident IdentificationINCIDENT IDENTIFICATION

−Work cannot begin on dealing with an incident until it is known that an incident has occurred.

−It is usually unacceptable, from a business perspective, to wait until a user is impacted and contacts the Service Desk.

−Ideally, incidents should be resolved before they have an impact on users!

INCIDENT LOGGING−All incidents must be fully logged and date/time stamped,

regardless of whether they are raised through a Service Desk telephone call or whether automatically detected via an event alert.

Page 25: 6 service operation

25

Incident Management – Interfaces • Problem Management

• Service Asset and Configuration Management (SACM)

• Change Management

• Capacity Management

• Availability Management

• Service Level Management

Page 26: 6 service operation

26

Incident Management – Key Metrics• Total number of incidents (as a control measure)

• Breakdown of incidents at each stage (for example, logged, WIP, closed, etc.)

• Size of incident backlog

• Mean elapsed time to resolution

• % resolved by the Service Desk (first-line fix)

• % handled within agreed response time

• % resolved within agreed Service Level Agreement target

• No. and % of Major Incidents

• No. and % of incident correctly assigned

• Average cost of incident handling

Page 27: 6 service operation

27

Incident Management – Roles • Incident Manager

−May be performed by Service Desk Supervisor

• Super Users−Usually Service Desk Analysis

• Second-Line Support

• Third-Line Support (Technical Management, IT Operations, Applications Management, Third-party suppliers)

Page 28: 6 service operation

28

Incident Management – Challenges • Ability to detect incidents as quickly as possible

(dependency on Event Management)

• Ensuring all incidents are logged

• Ensuring previous history is available (Incidents, Problems, Known Errors, Changes)

• Integration with Configuration Management System, Service Level Management, and Known Error Database (CMS, SLM, KEDB)

Page 29: 6 service operation

© 2008 Hewlett-Packard Development Company, L.P.The information contained herein is subject to change without notice

Request Fulfillment

Page 30: 6 service operation

30

Request Fulfillment• The term ‘Service Request’ is used as a generic

description for many varying types of demands that are placed upon the IT Department by the users.

• Many of these are actually small changes – low risk, frequently occurring, low cost, etc. −a request to change a password−a request to install an additional software application onto a

particular workstation− a request to relocate some items of desktop equipment− a question requesting information

• Their scale and frequent, low-risk nature means that they are better handled by a separate process, rather than being allowed to congest and obstruct the normal Incident and Change Management processes.

Page 31: 6 service operation

31

Purpose/Goals/Objectives• Request Fulfillment is the processes of dealing with

Service Requests from the users.

• The objectives of the Request Fulfillment process include:−To provide a channel for users to request and receive standard

services for which a pre-defined approval and qualification process exists

−To provide information to users and customers about the availability of services and the procedure for obtaining them

−To source and deliver the components of requested standard services (e.g. licences and software media)

−To assist with general information, complaints or comments.

Page 32: 6 service operation

32

Scope• The process needed to fulfill a request will vary depending upon

exactly what is being requested – but can usually be broken down into a set of activities that have to be performed.

• Some organizations will be comfortable to let the Service Requests be handled through their Incident Management processes (and tools) – with Service Requests being handled as a particular type of ‘incident’ (using a high-level categorization system to identify those ‘incidents’ that are in fact Service Requests).

Page 33: 6 service operation

33

Value to business• The value of Request Fulfillment is to provide quick and

effective access to standard services which business staff can use to improve their productivity or the quality of business services and products.

• Request Fulfillment effectively reduces the bureaucracy involved in requesting and receiving access to existing or new services, thus also reducing the cost of providing these services.

• Centralizing fulfillment also increases the level of control over these services. This in turn can help reduce costs through centralized negotiation with suppliers, and can also help to reduce the cost of support.

Page 34: 6 service operation

34

Policies, principles and basic concepts• Service Request

−A request from a User for information or advice, or for a Standard Change, For example

• To reset a password, or to provide standard IT Services for a new User.

• Request Model

• Self-Help

Page 35: 6 service operation

35

Request Fulfillment – Roles • Not usually dedicated staff

• Service Desk staff

• Incident Management staff

• Service Operations teams

Page 36: 6 service operation

© 2008 Hewlett-Packard Development Company, L.P.The information contained herein is subject to change without notice

Problem Management

Page 37: 6 service operation

37

Problem Management• ITIL defines a ‘problem’ as the unknown cause of one

or more incidents.

Page 38: 6 service operation

38

Purpose/Goals/Objectives• Problem Management is the process responsible for

managing the lifecycle of all problems.

• The primary objectives of Problem Management are to prevent problems and resulting incidents from happening, to eliminate recurring incidents and to minimize the impact of incidents that cannot be prevented. −To prevent problems and resulting Incidents from happening

−To eliminate recurring incidents

−To minimize the impact of incidents that cannot be prevented

Page 39: 6 service operation

39

Scope• Problem Management includes the activities required to diagnose

the root cause of incidents and to determine the resolution to those problems. − It is also responsible for ensuring that the resolution is implemented

through the appropriate control procedures, especially Change Management and Release Management.

• Problem Management will also maintain information about problems and the appropriate workarounds and resolutions, so that the organization is able to reduce the number and impact of incidents over time. − In this respect, Problem Management has a strong interface with

Knowledge Management, and tools such as the Known Error Database will be used for both.

• Although Incident and Problem Management are separate processes, they are closely related and will typically use the same tools, and may use similar categorization, impact and priority coding systems. This will ensure effective communication when dealing with related incidents and problems.

Page 40: 6 service operation

40

Value to business• Problem Management works together with Incident

Management and Change Management to ensure that IT service availability and quality are increased.

• When incidents are resolved, information about the resolution is recorded. Over time, this information is used to speed up the resolution time and identify permanent solutions, reducing the number and resolution time of incidents.

• This results in less downtime and less disruption to business critical systems.

• Additional value is derived from the following:− Higher availability of IT services− Higher productivity of business and IT staff − Reduced expenditure on workarounds or fixes that do not work− Reduction in cost of effort in fire-fighting or resolving repeat

incidents.

Page 41: 6 service operation

41

Policies, principles and basic concepts• Problem

−The unknown cause of one or more incidents

• Problem Models

• Workaround

• Known Error

• Known Error Database

Page 42: 6 service operation

42

Two major process in Problem ManagementProblem Management consists of two major processes:

−Reactive Problem Management • Resolution of underlying cause(s)

• Covered in Service Operation

−Pro-active Problem Management• Prevention of future problems

• Generally undertaken as part of CSI

Page 43: 6 service operation

43

Problem Management – Roles • Problem Manager

• Supported by technical groups−Technical Management

−IT Operations

−Applications Management

−Third-party suppliers

Page 44: 6 service operation

© 2008 Hewlett-Packard Development Company, L.P.The information contained herein is subject to change without notice

Access Management

Page 45: 6 service operation

45

Access Management• Access Management is the process of granting

authorized users the right to use a service, while preventing access to non-authorized users.

• It has also been referred to as Rights Management or Identity Management in different organizations.

Page 46: 6 service operation

46

Purpose/Goals/Objectives• Granting authorized users the right to use a service

−Access Management provides the right for users to be able to use a service or group of services.

• Preventing access by non-authorized users−It is therefore the execution of policies and actions defined in

Security and Availability Management.

Page 47: 6 service operation

47

Scope• Access Management is effectively the execution of both

Availability and Information Security Management, in that it enables the organization to manage the confidentiality, availability and integrity of the organization’s data and intellectual property.

• Access Management ensures that users are given the right to use a service, but it does not ensure that this access is available at all agreed times – this is provided by Availability Management.

• Access Management is a process that is executed by all Technical and Application Management functions and is usually not a separate function. However, there is likely to be a single control point of coordination, usually in IT Operations Management or on the Service Desk.

• Access Management can be initiated by a Service Request through the Service Desk.

Page 48: 6 service operation

48

Value to business• Access Management provides the following value:

−Controlled access to services ensures that the organization is able to maintain more effectively the confidentiality of its information

−Employees have the right level of access to execute their jobs effectively

−There is less likelihood of errors being made in data entry or in the use of a critical service by an unskilled user (e.g. production control systems)

−The ability to audit use of services and to trace the abuse of services

−The ability more easily to revoke access rights when needed – an important security consideration

−May be needed for regulatory compliance (e.g. SOX, HIPAA, COBIT).

Page 49: 6 service operation

49

Policies, principles and basic concepts• Access refers to the level and extent of a service’s functionality or

data that a user is entitled to use.

• Identity refers to the information about them that distinguishes them as an individual and which verifies their status within the organization. By definition, the Identity of a user is unique to that user.

• Rights (also called privileges) refer to the actual settings whereby a user is provided access to a service or group of services. Typical rights, or levels of access, include read, write, execute, change, delete.

• Services or service groups. Most users do not use only one service, and users performing a similar set of activities will use a similar set of services. Instead of providing access to each service for each user separately, it is more efficient to be able to grant each user – or group of users – access to the whole set of services that they are entitled to use at the same time.

• Directory Services refers to a specific type of tool that is used to manage access and rights.

Page 50: 6 service operation

50

Activities• Requesting access

• Verification

• Providing rights

• Monitoring identity status

• Logging and tracking access

• Removing or restricting rights

Page 51: 6 service operation

51

Access Management – Roles • Not usually dedicated staff

• Access management is an execution of Availability Management and Information Security Management

• Service Desk staff

• Technical Management staff

• Application Management staff

• IT Operations staff

Page 52: 6 service operation

© 2008 Hewlett-Packard Development Company, L.P.The information contained herein is subject to change without notice

Operations Management

Page 53: 6 service operation

53

Operational Activities – Other Processes• There are operational activities in other processes, as

follows:−Change Management

−Configuration Management

−Release Management

−Capacity Management

−Availability Management

−Knowledge Management

−Financial Management

−IT Service Continuity Management

Page 54: 6 service operation

© 2008 Hewlett-Packard Development Company, L.P.The information contained herein is subject to change without notice

Functions

Page 55: 6 service operation

55

Service Operation Functions• Service Desk

• Application Management

• Technical Management

• IT Operations Management

Page 56: 6 service operation

56

Service Desk• Primary point of contact

• Deals with all user issues (incidents, requests, standard changes)

• Coordinates actions across the IT organization to meet user requirements

• Different options (Local, Centralized, Virtual, Follow-the-Sun, specialized groups)

Page 57: 6 service operation

57

Types of Service Desk• Local Service Desk

−Designed to support local business needs

−Support is usually in the same location as the business it is supporting

• Central Service Desk−Designed to support multiple locations

−The desk is in a central location where the business is distributed

−Could provide support to local desks

• Virtual Service Desk−Location of SD analysts is invisible to the customers

−This may include some element of ‘home working’

−Common processes and procedures should exist

Page 58: 6 service operation

58

Local Service Desk

Page 59: 6 service operation

59

Centralized Service Desk

Page 60: 6 service operation

60

Virtual Service Desk

Page 61: 6 service operation

61

Service Desk objectives• Logging and categorizing Incidents, Service Requests and some

categories of change

• First line investigation and diagnosis

• Escalation

• Communication with Users and IT Staff

• Closing calls

• Customer satisfaction

• Update the CMS if so agreed

Page 62: 6 service operation

62

Service Desk staffing• Correct number and qualifications at any given time

considering:−Customer expectations and business requirements

−Number of users to support, their language and skills

−Coverage period, out-of-hours, time zones/locations, travel time

−Processes and procedures in place

• Minimum qualifications−Interpersonal skills

−Business understanding

−IT understanding

−Skill sets

• Customer and Technical emphasis, Expert

Page 63: 6 service operation

63

Service Desk metrics• Periodic evaluations of health, maturity, efficiency,

effectiveness and any opportunity to improve

• Realistic and carefully chosen – total number of call is not itself good or bad

• Some examples:−First-line resolution rate

−Average time to resolve and/or escalate and incident

−Total costs for the period divided by total call duration minutes

−The number of calls broken down by time of day and day of week, combined with the average call-time

Page 64: 6 service operation

64

Applications Management• Manages Applications throughout their Lifecycle

• Performed by any department, group or team managing and supporting operational Applications

• Role in the design, testing and improvement of Applications that form part of IT Services

• Involved in development projects, but not usually the same as the Application Development teams

• Provides resources throughout the lifecycle

• Guidance to IT Operations Management

Page 65: 6 service operation

65

Applications Management – Objectives • Well designed, resilient, cost effective applications

• Ensuring availability of functionality

• Maintain operational applications

• Support during application failures

Page 66: 6 service operation

66

Applications Management – Roles • Application Manager / Team leaders

• Applications Analyst / Architect

• Note: Application Management teams are usually aligned to the applications they manage

Page 67: 6 service operation

67

Technical Management• The groups, departments or teams that provide

technical expertise and overall management of the IT Infrastructure−Custodians of technical knowledge and expertise related to

managing the IT Infrastructure

−Provide the actual resources to support the IT Service Management Lifecycle

−Perform many of the common activities already outlined

−Execute most ITSM processes

Page 68: 6 service operation

68

Technical Management Organization• Technical teams are usually aligned to the

technology they manage

• Can include operational activities

• Examples:−Mainframe management

−Server Management

−Internet / Web Management

−Network Management

−Database Administration

Page 69: 6 service operation

69

Technical Management – Objectives • Design of resilient, cost-effective infrastructure

configuration

• Maintenance of the infrastructure

• Support during technical failures

Page 70: 6 service operation

70

Technical Management – Roles • Technical Managers

• Team Leaders

• Technical Analysts / Architects

• Technical Operator

Page 71: 6 service operation

71

IT Operations Management• The department, group or team of people responsible

for performing the organization’s day-to-day operational activities, such as:−Console Management

−Job Scheduling

−Backup and Restore

−Print and Output management

−Performance of maintenance activities

−Facilities Management

−Operations Bridge

−Network Operations Center

−Monitoring the infrastructure, applications and services

Page 72: 6 service operation

72

IT Operations Management - Objectives• Maintaining the “status quo” to achieve

infrastructure stability

• Identify opportunities to improve operational performance and save costs

• Initial diagnosis and resolution of operational Incidents

Page 73: 6 service operation

73

IT Operations Management – Roles • IT Operations Manager

• Shift Leaders

• IT Operations Analysis

• IT Operators