CFGM & Critical INCM - Interlink

Configuration Management & Critical Incident Management interlink

Agenda

• Introduction• Critical Incident Management Process in a “NUT SHELL”• KPE's, RtOP & EON•The "Critical" Information Flow & Key CI Attributes• Bridging the “GAP”• Q&A

Introduction

• INCM has high visibility in an Organization• Critical to Business Continuity and Emergency Ops Plan• Service Availability is the “Key”• The Underlying “CMDB” is a crucial factor• Enable Communication and Decision Making Capabilities

Critical Incident Management Process in a “NUT SHELL”

What is a Critical Incident ?

• An Incident causing a complete interruption or extreme degradation of service delivered to a client’s KPE, impacting the environment or business operation.

What is an Incident ?- Definition from ITIL V3 :

• An unplanned interruption to an IT Service or a reduction in the Quality of an IT Service. Failure of a Configuration Item that has not yet impacted Service is also an Incident. For example, Failure of one disk from a mirror set.

Critical Incident Management Process OverviewStrategic Incident Management Process

3.0

SIM

Inci

dent

Rep

ort

Act

iviti

es

2.0

SIM

Pro

cess

Inci

dent

Han

dlin

g 1

.0S

IM P

roce

ss In

itiat

ion

NO

YES

NO

YES

YES

NO

YES

YES

YESNO

YES

NO

NO

NO

2.5SIM Update Communication

(SMS/E-mail/Exec summary)

1.5Incident Closure

3.5SIM IR Document

Distribution

2.7Escalation Process

1.4Service is

restored = ?

3.2PRM Handover

Needed = ?

3.1Is SIM IR needed

= ?

2.8Final SIM

Communication(SMS/E-mail)

3.6Problem

Management

1.7SIM Process

Initiation

2.3SRT / War room Establishment

3.4SIM IR Review

Meeting

2.1Initial SIM

Communication

2.2KPE affected = ?

2.6Service

Restored = ?

1.2SIM Incident =

?

3.7Incident Closure

2.4Action Plan Creation

& Execution

3.3SIM IR Document

Draft

1.1- Svc Call / OVO Alert- SDM / Customer call

2.9EMEA RtOP

Crisis Process

1.6SIM needed =?

1.3Standard Incident

Management Process

Critical Incident Management Time Line

00:00 00:45

00:00Ticket Creation

00:05Inform SIM

00:10SIM calls DL

00:15ADM & Tech Teams

Informed00:30

Business Impact Confirmed and escalate to L2

00:45confirm path to resolution or start SIM Process

• Setting up the Service Restoration Team with minimal delay will Decide on the Time frame of Service Restoration.• Getting the Required Information – Org Details , CI Details & Relationships ,Technical Escalation Matrix & Current Impact are all Deciding Factors.• Know-How on the KPE affected , will enhance SIM to trigger RtOP and Efficiently manage the Incident End to End

The Key Success Factors

KPE's, RtOP & EON

What is KPE ?

A Key Production Environment (KPE) is a service, physically represented / supported by one or more IT components, whose loss or impairment will seriously impact the business of one or more (external or internal) customers and/or their customers.

Also referred to as a Vital Business Function (VBF)

•A Function of a Business Process that is critical to the success of the Business.

An outage or serious reduction of its functionality will result in a Priority 1 Incident.

Documentation to be stored within CIS = Contract Information SystemLinkage should be made within ESL = Enterprise System List

KPE’s & Supporting CI Layer

What is an RtOP ?

• RtOP stands for Response for Operational Problems

• RtOP procedure is underpinned to the process incident management for outages which have a significant business impact to the client

Purpose :

• The Response to Operational Problems (RtOP) procedure was developed to provide a solution to ensure timely communication of all HP P1 incidents to HP Enterprise Services leadership.

• RtOP is a corporate standard as referred to in the SRA (Standard Reference Architecture). The RtOP procedure is required for all Priority 1 incidents where a Mission Critical Environment (referred to as a Key Production Environment) is impacted or at risk. This procedure is the notification to HP Enterprise Services leadership.

Scope :

• An RtOP is Invoked when an Incident causing a complete interruption of service delivery to the affected customer service entity / key production environment(s) or business operation. Those affected cannot utilize one or more predefined key services until service delivery is restored. There is no immediate workaround.

Note: This is normally when the client's IT Director, CIO or CEO has been made aware of the issue due to the criticality to the business and therefore possibility of a client escalation to HP Management.

RtOP Types :

• RtOP : Critical Outage – HP Responsibility or 3rd Vendor (HP owns the support contract)

• VRtOP : Critical Outage – Client Responsibility or Client 3rd Vendor (Client owns the support contract)

• IRtOP : Risk of Critical Outage, Contractual P1 not meeting RtOP P1 definition or Non Operational Issue

•α-RtOP : Critical Outage – Multiple Clients ( Shared Services) , Long Running Outage , Brand Image jeopardized , Client Dissatisfaction …

RtOP Vs Critical Incident Management

• Critical Incident Management is the “ Super Set” and RtOP is its “Sub Set”• Not all Account P1’s classify as RtOP’s• RtOP communication Involves Executive Audience• RtOP process is triggered for KPE Outages Only

RtOP Procedure Flow

The "Critical" Information Flow & Key CI Attributes

CMDB Interface in Sev 1 Process Flow

Critical Incident Phases

CMDB Interface in Sev 1 Process Flow Critical Incident Phases

❶ Incident Detection ❷ Classification and Prioritization

Events/End UserIssue Description

Initial Priority Services AffectedIncident Number

Site & Locations affectedCustomer Contact

Sev 1 CriteriaImpact Analysis

CoverageInitial Assignment Group

Priority JustificationCapabilities Involved

❹ Recovery , Resolution ,& Closure

❸ Investigation and Diagnosis

IRNotification & Communication:

Resilient and Data RecoverySpecial Handling Instructions

H/W-S/W contractsDowntime ContactsTechnical Escalation

Resolution ConfirmationProblem Management Triggered

PIR Initiation

Services HostedEnvironmentHands & Eyes

KPE check and Linked CI’sCI Location

Recent ChangesResilient TechnologyCI Usage Description

Contractual Information

CMDB Interface in Sev 1 Process Flow

Key user’s of the “CMDB”

• Service Desk• AST’s• Accounts Community• APM’s• Global Crisis Managers• Client Capability leads

CMDB Interface in Sev 1 Process FlowCritical Incident Phases CFGM Interlink

Bridging the “GAP”

Bridging the ”GAP”

• Incomplete KPE Linkages•Downtime Contacts & System Usage •“Void “ Business Criticality Fields•DRP solutions not available for Business Critical Systems•Obsolete & MTP Systems linked to KPE’s•Critical Changes not Captured in CMDB•Hardware / Software Contact •Hands & Eye Information and DC location details for DC / Onsite

access.

The Missing “Links”

Bridging the ”GAP”

•Periodic KPE audits with regards to KPE/Hostname Linkages•Accommodate Attributes essential to Incident Management in CMDB

Audits•“Talk” with Change Management on Recent critical changes to be

updated in CMDB•“Talk” with Incident Management on RtOP’s and check for any

missing KPE linkages or Invalid KPE’s.•Feedback to AE/ADE’s on Invalid KPE’s found during KPE audit.•Feedback to Availability Management on Failed KPE Resilience.• Interface with Problem Management to resolve CI Discrepancies

Indentified•Lastly – A CI Relationship diagram will Indeed “Help”

This will “HELP”

CFGM & Critical INCM - Interlink

Documents

Transcript of CFGM & Critical INCM - Interlink