CFGM & Critical INCM - Interlink
-
Upload
mattyjohnk -
Category
Documents
-
view
330 -
download
2
Transcript of CFGM & Critical INCM - Interlink
Configuration Management & Critical Incident Management interlink
Agenda
• Introduction• Critical Incident Management Process in a “NUT SHELL”• KPE's, RtOP & EON•The "Critical" Information Flow & Key CI Attributes• Bridging the “GAP”• Q&A
Introduction
• INCM has high visibility in an Organization• Critical to Business Continuity and Emergency Ops Plan• Service Availability is the “Key”• The Underlying “CMDB” is a crucial factor• Enable Communication and Decision Making Capabilities
Critical Incident Management Process in a “NUT SHELL”
What is a Critical Incident ?
• An Incident causing a complete interruption or extreme degradation of service delivered to a client’s KPE, impacting the environment or business operation.
What is an Incident ?- Definition from ITIL V3 :
• An unplanned interruption to an IT Service or a reduction in the Quality of an IT Service. Failure of a Configuration Item that has not yet impacted Service is also an Incident. For example, Failure of one disk from a mirror set.
Critical Incident Management Process OverviewStrategic Incident Management Process
3.0
SIM
Inci
dent
Rep
ort
Act
iviti
es
2.0
SIM
Pro
cess
Inci
dent
Han
dlin
g 1
.0S
IM P
roce
ss In
itiat
ion
NO
YES
NO
YES
YES
NO
YES
YES
YESNO
YES
NO
NO
NO
2.5SIM Update Communication
(SMS/E-mail/Exec summary)
1.5Incident Closure
3.5SIM IR Document
Distribution
2.7Escalation Process
1.4Service is
restored = ?
3.2PRM Handover
Needed = ?
3.1Is SIM IR needed
= ?
2.8Final SIM
Communication(SMS/E-mail)
3.6Problem
Management
1.7SIM Process
Initiation
2.3SRT / War room Establishment
3.4SIM IR Review
Meeting
2.1Initial SIM
Communication
2.2KPE affected = ?
2.6Service
Restored = ?
1.2SIM Incident =
?
3.7Incident Closure
2.4Action Plan Creation
& Execution
3.3SIM IR Document
Draft
1.1- Svc Call / OVO Alert- SDM / Customer call
2.9EMEA RtOP
Crisis Process
1.6SIM needed =?
1.3Standard Incident
Management Process
Critical Incident Management Time Line
00:00 00:45
00:00Ticket Creation
00:05Inform SIM
00:10SIM calls DL
00:15ADM & Tech Teams
Informed00:30
Business Impact Confirmed and escalate to L2
00:45confirm path to resolution or start SIM Process
• Setting up the Service Restoration Team with minimal delay will Decide on the Time frame of Service Restoration.• Getting the Required Information – Org Details , CI Details & Relationships ,Technical Escalation Matrix & Current Impact are all Deciding Factors.• Know-How on the KPE affected , will enhance SIM to trigger RtOP and Efficiently manage the Incident End to End
The Key Success Factors
KPE's, RtOP & EON
What is KPE ?
A Key Production Environment (KPE) is a service, physically represented / supported by one or more IT components, whose loss or impairment will seriously impact the business of one or more (external or internal) customers and/or their customers.
Also referred to as a Vital Business Function (VBF)
•A Function of a Business Process that is critical to the success of the Business.
An outage or serious reduction of its functionality will result in a Priority 1 Incident.
Documentation to be stored within CIS = Contract Information SystemLinkage should be made within ESL = Enterprise System List
KPE’s & Supporting CI Layer
What is an RtOP ?
• RtOP stands for Response for Operational Problems
• RtOP procedure is underpinned to the process incident management for outages which have a significant business impact to the client
Purpose :
• The Response to Operational Problems (RtOP) procedure was developed to provide a solution to ensure timely communication of all HP P1 incidents to HP Enterprise Services leadership.
• RtOP is a corporate standard as referred to in the SRA (Standard Reference Architecture). The RtOP procedure is required for all Priority 1 incidents where a Mission Critical Environment (referred to as a Key Production Environment) is impacted or at risk. This procedure is the notification to HP Enterprise Services leadership.
Scope :
• An RtOP is Invoked when an Incident causing a complete interruption of service delivery to the affected customer service entity / key production environment(s) or business operation. Those affected cannot utilize one or more predefined key services until service delivery is restored. There is no immediate workaround.
Note: This is normally when the client's IT Director, CIO or CEO has been made aware of the issue due to the criticality to the business and therefore possibility of a client escalation to HP Management.
RtOP Types :
• RtOP : Critical Outage – HP Responsibility or 3rd Vendor (HP owns the support contract)
• VRtOP : Critical Outage – Client Responsibility or Client 3rd Vendor (Client owns the support contract)
• IRtOP : Risk of Critical Outage, Contractual P1 not meeting RtOP P1 definition or Non Operational Issue
•α-RtOP : Critical Outage – Multiple Clients ( Shared Services) , Long Running Outage , Brand Image jeopardized , Client Dissatisfaction …
RtOP Vs Critical Incident Management
• Critical Incident Management is the “ Super Set” and RtOP is its “Sub Set”• Not all Account P1’s classify as RtOP’s• RtOP communication Involves Executive Audience• RtOP process is triggered for KPE Outages Only
RtOP Procedure Flow
The "Critical" Information Flow & Key CI Attributes
CMDB Interface in Sev 1 Process Flow
Critical Incident Phases
CMDB Interface in Sev 1 Process Flow Critical Incident Phases
❶ Incident Detection ❷ Classification and Prioritization
Events/End UserIssue Description
Initial Priority Services AffectedIncident Number
Site & Locations affectedCustomer Contact
Sev 1 CriteriaImpact Analysis
CoverageInitial Assignment Group
Priority JustificationCapabilities Involved
❹ Recovery , Resolution ,& Closure
❸ Investigation and Diagnosis
IRNotification & Communication:
Resilient and Data RecoverySpecial Handling Instructions
H/W-S/W contractsDowntime ContactsTechnical Escalation
Resolution ConfirmationProblem Management Triggered
PIR Initiation
Services HostedEnvironmentHands & Eyes
KPE check and Linked CI’sCI Location
Recent ChangesResilient TechnologyCI Usage Description
Contractual Information
CMDB Interface in Sev 1 Process Flow
Key user’s of the “CMDB”
• Service Desk• AST’s• Accounts Community• APM’s• Global Crisis Managers• Client Capability leads
CMDB Interface in Sev 1 Process FlowCritical Incident Phases CFGM Interlink
Bridging the “GAP”
Bridging the ”GAP”
• Incomplete KPE Linkages•Downtime Contacts & System Usage •“Void “ Business Criticality Fields•DRP solutions not available for Business Critical Systems•Obsolete & MTP Systems linked to KPE’s•Critical Changes not Captured in CMDB•Hardware / Software Contact •Hands & Eye Information and DC location details for DC / Onsite
access.
The Missing “Links”
Bridging the ”GAP”
•Periodic KPE audits with regards to KPE/Hostname Linkages•Accommodate Attributes essential to Incident Management in CMDB
Audits•“Talk” with Change Management on Recent critical changes to be
updated in CMDB•“Talk” with Incident Management on RtOP’s and check for any
missing KPE linkages or Invalid KPE’s.•Feedback to AE/ADE’s on Invalid KPE’s found during KPE audit.•Feedback to Availability Management on Failed KPE Resilience.• Interface with Problem Management to resolve CI Discrepancies
Indentified•Lastly – A CI Relationship diagram will Indeed “Help”
This will “HELP”
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Q & A
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Thank you