CFGM & Critical INCM - Interlink

25
Configuration Management & Critical Incident Management interlink

Transcript of CFGM & Critical INCM - Interlink

Page 1: CFGM & Critical INCM - Interlink

Configuration Management & Critical Incident Management interlink

Page 2: CFGM & Critical INCM - Interlink

Agenda

• Introduction• Critical Incident Management Process in a “NUT SHELL”• KPE's, RtOP & EON•The "Critical" Information Flow & Key CI Attributes• Bridging the “GAP”• Q&A

Page 3: CFGM & Critical INCM - Interlink

Introduction

• INCM has high visibility in an Organization• Critical to Business Continuity and Emergency Ops Plan• Service Availability is the “Key”• The Underlying “CMDB” is a crucial factor• Enable Communication and Decision Making Capabilities

Page 4: CFGM & Critical INCM - Interlink

Critical Incident Management Process in a “NUT SHELL”

Page 5: CFGM & Critical INCM - Interlink

What is a Critical Incident ?

• An Incident causing a complete interruption or extreme degradation of service delivered to a client’s KPE, impacting the environment or business operation.

What is an Incident ?- Definition from ITIL V3 :

• An unplanned interruption to an IT Service or a reduction in the Quality of an IT Service. Failure of a Configuration Item that has not yet impacted Service is also an Incident. For example, Failure of one disk from a mirror set.

Page 6: CFGM & Critical INCM - Interlink

Critical Incident Management Process OverviewStrategic Incident Management Process

3.0

SIM

Inci

dent

Rep

ort

Act

iviti

es

2.0

SIM

Pro

cess

Inci

dent

Han

dlin

g 1

.0S

IM P

roce

ss In

itiat

ion

NO

YES

NO

YES

YES

NO

YES

YES

YESNO

YES

NO

NO

NO

2.5SIM Update Communication

(SMS/E-mail/Exec summary)

1.5Incident Closure

3.5SIM IR Document

Distribution

2.7Escalation Process

1.4Service is

restored = ?

3.2PRM Handover

Needed = ?

3.1Is SIM IR needed

= ?

2.8Final SIM

Communication(SMS/E-mail)

3.6Problem

Management

1.7SIM Process

Initiation

2.3SRT / War room Establishment

3.4SIM IR Review

Meeting

2.1Initial SIM

Communication

2.2KPE affected = ?

2.6Service

Restored = ?

1.2SIM Incident =

?

3.7Incident Closure

2.4Action Plan Creation

& Execution

3.3SIM IR Document

Draft

1.1- Svc Call / OVO Alert- SDM / Customer call

2.9EMEA RtOP

Crisis Process

1.6SIM needed =?

1.3Standard Incident

Management Process

Page 7: CFGM & Critical INCM - Interlink

Critical Incident Management Time Line

00:00 00:45

00:00Ticket Creation

00:05Inform SIM

00:10SIM calls DL

00:15ADM & Tech Teams

Informed00:30

Business Impact Confirmed and escalate to L2

00:45confirm path to resolution or start SIM Process

• Setting up the Service Restoration Team with minimal delay will Decide on the Time frame of Service Restoration.• Getting the Required Information – Org Details , CI Details & Relationships ,Technical Escalation Matrix & Current Impact are all Deciding Factors.• Know-How on the KPE affected , will enhance SIM to trigger RtOP and Efficiently manage the Incident End to End

The Key Success Factors

Page 8: CFGM & Critical INCM - Interlink

KPE's, RtOP & EON

Page 9: CFGM & Critical INCM - Interlink

What is KPE ?

A Key Production Environment (KPE) is a service, physically represented / supported by one or more IT components, whose loss or impairment will seriously impact the business of one or more (external or internal) customers and/or their customers.

Also referred to as a Vital Business Function (VBF)

•A Function of a Business Process that is critical to the success of the Business.  

An outage or serious reduction of its functionality will result in a Priority 1 Incident.

Documentation to be stored within CIS = Contract Information SystemLinkage should be made within ESL = Enterprise System List

Page 10: CFGM & Critical INCM - Interlink

KPE’s & Supporting CI Layer

Page 11: CFGM & Critical INCM - Interlink

What is an RtOP ?

• RtOP stands for Response for Operational Problems

• RtOP procedure is underpinned to the process incident management for outages which have a significant business impact to the client

Purpose :

• The Response to Operational Problems (RtOP) procedure was developed to provide a solution to ensure timely communication of all HP P1 incidents to HP Enterprise Services leadership.

• RtOP is a corporate standard as referred to in the SRA (Standard Reference Architecture). The RtOP procedure is required for all Priority 1 incidents where a Mission Critical Environment (referred to as a Key Production Environment) is impacted or at risk. This procedure is the notification to HP Enterprise Services leadership.

Page 12: CFGM & Critical INCM - Interlink

Scope :

• An RtOP is Invoked when an Incident causing a complete interruption of service delivery to the affected customer service entity / key production environment(s) or business operation. Those affected cannot utilize one or more predefined key services until service delivery is restored. There is no immediate workaround.

Note: This is normally when the client's IT Director, CIO or CEO has been made aware of the issue due to the criticality to the business and therefore possibility of a client escalation to HP Management.

Page 13: CFGM & Critical INCM - Interlink

RtOP Types :

• RtOP : Critical Outage – HP Responsibility or 3rd Vendor (HP owns the support contract)

• VRtOP : Critical Outage – Client Responsibility or Client 3rd Vendor (Client owns the support contract)

• IRtOP : Risk of Critical Outage, Contractual P1 not meeting RtOP P1 definition or Non Operational Issue

•α-RtOP : Critical Outage – Multiple Clients ( Shared Services) , Long Running Outage , Brand Image jeopardized , Client Dissatisfaction …

Page 14: CFGM & Critical INCM - Interlink

RtOP Vs Critical Incident Management

• Critical Incident Management is the “ Super Set” and RtOP is its “Sub Set”• Not all Account P1’s classify as RtOP’s• RtOP communication Involves Executive Audience• RtOP process is triggered for KPE Outages Only

Page 15: CFGM & Critical INCM - Interlink

RtOP Procedure Flow

Page 16: CFGM & Critical INCM - Interlink

The "Critical" Information Flow & Key CI Attributes

Page 17: CFGM & Critical INCM - Interlink

CMDB Interface in Sev 1 Process Flow

Critical Incident Phases

Page 18: CFGM & Critical INCM - Interlink

CMDB Interface in Sev 1 Process Flow Critical Incident Phases

❶ Incident Detection ❷ Classification and Prioritization

Events/End UserIssue Description

Initial Priority Services AffectedIncident Number

Site & Locations affectedCustomer Contact

Sev 1 CriteriaImpact Analysis

CoverageInitial Assignment Group

Priority JustificationCapabilities Involved

❹ Recovery , Resolution ,& Closure

❸ Investigation and Diagnosis

IRNotification & Communication:

Resilient and Data RecoverySpecial Handling Instructions

H/W-S/W contractsDowntime ContactsTechnical Escalation

Resolution ConfirmationProblem Management Triggered

PIR Initiation

Services HostedEnvironmentHands & Eyes

KPE check and Linked CI’sCI Location

Recent ChangesResilient TechnologyCI Usage Description

Contractual Information

Page 19: CFGM & Critical INCM - Interlink

CMDB Interface in Sev 1 Process Flow

Key user’s of the “CMDB”

• Service Desk• AST’s• Accounts Community• APM’s• Global Crisis Managers• Client Capability leads

Page 20: CFGM & Critical INCM - Interlink

CMDB Interface in Sev 1 Process FlowCritical Incident Phases CFGM Interlink

Page 21: CFGM & Critical INCM - Interlink

Bridging the “GAP”

Page 22: CFGM & Critical INCM - Interlink

Bridging the ”GAP”

• Incomplete KPE Linkages•Downtime Contacts & System Usage •“Void “ Business Criticality Fields•DRP solutions not available for Business Critical Systems•Obsolete & MTP Systems linked to KPE’s•Critical Changes not Captured in CMDB•Hardware / Software Contact •Hands & Eye Information and DC location details for DC / Onsite

access.

The Missing “Links”

Page 23: CFGM & Critical INCM - Interlink

Bridging the ”GAP”

•Periodic KPE audits with regards to KPE/Hostname Linkages•Accommodate Attributes essential to Incident Management in CMDB

Audits•“Talk” with Change Management on Recent critical changes to be

updated in CMDB•“Talk” with Incident Management on RtOP’s and check for any

missing KPE linkages or Invalid KPE’s.•Feedback to AE/ADE’s on Invalid KPE’s found during KPE audit.•Feedback to Availability Management on Failed KPE Resilience.• Interface with Problem Management to resolve CI Discrepancies

Indentified•Lastly – A CI Relationship diagram will Indeed “Help”

This will “HELP”

Page 24: CFGM & Critical INCM - Interlink

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Q & A

Page 25: CFGM & Critical INCM - Interlink

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Thank you