Create a Right-Sized Disaster Recovery Plan

18
Info-Tech Research Group 1 Info-Tech Research Group 1 Info-Tech Research Group, Inc. is a global leader in providing IT research and advice. Info-Tech’s products and services combine actionable insight and relevant advice with ready-to-use tools and templates that cover the full spectrum of IT concerns. © 1997-2015 Info-Tech Research Group Inc. Create a Right-Sized Disaster Recovery Plan Close the gap between your DR capabilities and service continuity requirements.

Transcript of Create a Right-Sized Disaster Recovery Plan

Page 1: Create a Right-Sized Disaster Recovery Plan

Info-Tech Research Group 1Info-Tech Research Group 1

Info-Tech Research Group, Inc. is a global leader in providing IT research and advice.Info-Tech’s products and services combine actionable insight and relevant advice with

ready-to-use tools and templates that cover the full spectrum of IT concerns.© 1997-2015 Info-Tech Research Group Inc.

Create a Right-Sized Disaster Recovery PlanClose the gap between your DR capabilities and service continuity requirements.

Page 2: Create a Right-Sized Disaster Recovery Plan

Info-Tech Research Group 2Info-Tech Research Group 2

An effective DRP addresses common outages such as hardware and software failures, as well as regional events, to provide day-to-day service continuity. It’s not just insurance you might never cash in.

Customers are also demanding evidence of an effective DRP, so organizations without a DRP risk business impact not only from extended outages but also from lost sales.

If you are fortunate enough to have executive buy-in, whether it’s due to customer pressure or concern over potential downtime, you still have the challenge of limited time to dedicate to DR planning. Organizations need a practical, but structured approach that enables IT leaders to create a DRP without it becoming their full-time job.

Frank Trovato, Senior Manager, Infrastructure

Info-Tech Research Group

A Disaster Recovery Plan Is No Longer Just an Insurance Policy

ANALYST PERSPECTIVE

Page 3: Create a Right-Sized Disaster Recovery Plan

Info-Tech Research Group 3Info-Tech Research Group 3

This Research is Designed For: This Research Will Help You:

This Research Will Assist: This Research Will Help You:

This Research Is Designed For: This Research Will Help You:

This Research Will Also Assist: This Research Will Help Them:

Our understanding of the problem

Senior IT management responsible for executing disaster recovery.

Organizations seeking to formalize, optimize, or validate an existing disaster recovery plan (DRP).

Business continuity management (BCM) professionals leading DRP development.

Create a disaster recovery plan that is aligned with business requirements.

Prioritize technology enhancements based on DR requirements and risk-impact analysis.

Identify and address process gaps that impact DR capability and day-to-day service continuity.

Executives seeking to understand the time and resource commitment required for disaster recovery planning.

Members of BCM and crisis management teams who need to understand the elements of both the business continuity plan (BCP) and DRP.

Scope the time and effort required to develop a DRP.

Ensure alignment across business continuity, disaster recovery, and crisis management plans.

Page 4: Create a Right-Sized Disaster Recovery Plan

Info-Tech Research Group 4Info-Tech Research Group 4

Resolution

Situation

Complication

Info-Tech Insight

Executive summary

• Any time a natural disaster or major IT outage occurs, it increases executive awareness and internal pressure to create a DRP.

• Similarly, industry and government-driven regulations are placing more focus on business continuity – and therefore DRP by extension.

• Customers are also demanding that organizations provide evidence that they have a workable DRP before agreeing to do business.

• Traditional DRP templates are onerous and result in a lengthy, dense plan that might satisfy auditors, but is not effective in a crisis.

• Similarly, the myth that a DRP is only for major disasters and should be risk-based leaves organizations vulnerable to more common incidents.

• The increased use of cloud vendors and co-lo/managed service providers means you may depend on vendors to meet recovery timeline objectives.

• Create an effective DRP by following a structured process to discover current capabilities and define business requirements for continuity, not by completing a one-size-fits-all traditional DRP template. This includes:

◦ Defining appropriate objectives for maximum downtime and data loss based on business impact.

◦ Creating a DR project roadmap to close the gaps between your current DR capabilities and recovery objectives.

◦ Documenting an incident response plan based on a tabletop planning walkthrough that captures all of the steps from event detection to data center recovery.

1. DR is about service continuity – that means accounting for minor and major events.

2. Remember Murphy’s Law. Failure happens, so focus on improving overall resiliency and recovery, rather than basing DR on risk probability analysis.

3. Cost-effective DR and service continuity starts with identifying what is truly mission critical so you can focus resources accordingly. Not all systems require fast failover capability.

Page 5: Create a Right-Sized Disaster Recovery Plan

Info-Tech Research Group 5Info-Tech Research Group 5

A disaster recovery plan is part of an overall business continuity plan

IT Disaster Recovery Plan BCP for Each Business Unit Crisis Management Plan

Overall Business Continuity Plan

A plan to restore IT services (e.g. applications and infrastructure) following a disruption. This includes:

• Identifying critical applications and dependencies.

• Defining an appropriate (desired) recovery timeline based on a business impact analysis.

• Creating a step-by-step incident response plan.

A set of plans to resume business processes for each business unit.

Info-Tech’s Develop a Business Continuity Plan blueprint provides a methodology for creating business unit BCPs as part of an overall BCP for the organization.

A set of processes to manage a wide range of crises, from health and safety incidents to business disruptions to reputational damage. This includes emergency response plans, crisis communication plans, and the steps to invoke BC/DR plans when applicable.

Info-Tech’s Implement Crisis Management Best Practices blueprint provides a structured approach to develop a crisis management process.

A disaster recovery plan (DRP) consists of a set of procedures and supporting information that enables an organization to restore its IT services (e.g. applications and infrastructure) as part of an overall business continuity plan (BCP), as described below. Use this blueprint to implement a structured methodology to create your DRP.

Note: For disaster recovery planning, we use applications where possible as a starting point to keep the focus on business-facing IT services (as opposed to the underlying infrastructure), and then identify required infrastructure as an application dependency (e.g. the servers, databases, and network infrastructure required to support the application). Additional business-facing systems that we will use as a starting point will include broader systems such as a corporate website.

Page 6: Create a Right-Sized Disaster Recovery Plan

Info-Tech Research Group 6Info-Tech Research Group 6

An effective DRP is critical in reducing recovery time and the cost of downtimeIf you don’t have an effective disaster recovery plan when failure occurs, expect to face extended downtime and exponentially rising costs due to confusion and lack of documented processes.

The impact of downtime increases significantly over time, as illustrated for lost revenue in the graph to the left. An up-to-date and tested DRP will significantly increase the consistency of your ability to recover and is critical to minimizing downtime and business impact.

If you do not have an existing DRP, your organization is gambling on being able to define and implement a recovery strategy during a time of crisis. At the very least, this means extended downtime – potentially weeks or months – and substantial business impact.

Potential Lost Revenue

Adapted from: Rothstein, Philip Jan. Disaster Recovery Testing: Exercising Your Contingency Plan (2007 Edition).

Delay in recovery causes exponential revenue loss

Cost of Downtime for the Fortune 1000 Info-Tech Insight

The cost of downtime is rising across the board, and not just for organizations that traditionally depend on IT (e.g. e-commerce).

Downtime cost increase since 2010:

Cost of unplanned apps downtime per year: $1.25B to $2.5B

Cost of critical apps failure per hour: $500,000 to $1M

Cost of infrastructure failure per hour: $100,000

35% reported to have recovered within 12 hours.

17% of infrastructure failures took more than 24 hours to recover.

13% of application failures took more than 24 hours to recover.

Source: Elliot, Stephen. DevOps and the Cost of Downtime: Fortune 1000 Best Practice Metrics Quantified. IDC, 2015.

Hospitality 129% increase

Transportation 108% increase

Media organizations 104% increase

Page 7: Create a Right-Sized Disaster Recovery Plan

Info-Tech Research Group 7Info-Tech Research Group 7

Myth #1: DRPs need to focus on major events such as natural disasters and other highly destructive incidents such as fire and flood.Reality: The most common threats to service continuity are hardware and software failures, network outages, and power outages.Forty-five percent of service interruptions that

went beyond maximum downtime guidelines set by the business were caused by software and hardware issues.

45% Total

Natural Disaster

5%7%

Building is Inaccessible (e.g. due to a local hazard)

5%

Power Outage

18%

External Network Failure

19%

Isolated Hardware Failure

21%

Software Failure

24%

Cau

ses

of U

nacc

epta

ble

Dow

ntim

e

Source: Info-Tech Research Group; N=87

Only 12% of incidents were caused by major destructive events.

Does this mean I don’t need to worry about natural disasters? No. It means DR planning needs to focus on overall service continuity, not just major disasters. If you ignore the more common, but less dramatic causes of service interruptions, you will suffer the proverbial “death from a thousand cuts.”

Equipment Damage (e.g. due to fire, roof collapse)

12% Total

Page 8: Create a Right-Sized Disaster Recovery Plan

Info-Tech Research Group 8Info-Tech Research Group 8

Myth #2: Effective DRPs start with identifying and evaluating potential risks.Reality: DR is not about mitigating risks; it’s about ensuring service continuity.

The common “by-the-book” approach is to identify risks, assess probability, and then build a plan to mitigate those risks.

Here’s why the risk approach is ineffective:

• Unless you can foresee the future, odds are that you won’t think of every incident that might occur. If you think of 20 risks, it will be the 21st that gets you.

• If you take risk assessment to an extreme level to try to guard against that unforeseen 21st risk, you can quickly get into unrealistic and cartoonish scenarios and much more costly solutions.

• The traditional risk-assessment process for DR planning is time consuming, often has little immediate value, and delays more effective actions (e.g. process and technology enhancements).

We know failure happens regardless of your risk profile, so strive for overall resiliency that will enable you to recover regardless of the specific risk or incident.

In this blueprint, the business impact analysis (BIA) is the primary driver in your recovery strategy. A high-level risk analysis is used as a secondary driver (e.g. identify single points of failure for critical systems that should have redundancy based on business impact).

Page 9: Create a Right-Sized Disaster Recovery Plan

Info-Tech Research Group 9Info-Tech Research Group 9

Myth #3: DRPs are a separate entity from normal day-to-day operations.Reality: Again, the goal of DR is to maintain service continuity and that starts with day-to-day service management.

If a tornado takes out your data center, it’s an obvious DR scenario. Where processes often break down is in less obvious DR scenarios (e.g. hardware/software issues) when it’s not clear when to move from service management procedures to DR procedures.

Extending service management processes to account for disaster scenarios helps you ensure more timely and appropriate responses and meet recovery timeline requirements.

Organizations that account for disasters in their service management processes (e.g. severity definitions, escalation rules) are much more successful at meeting RTO and RPO requirements.

Escalation Procedures

Incident Models

 Severity Definitions 

Incident Classifications

Suc

cess

Mee

ting

RTO

and

RP

O

Source: Info-Tech Research Group; N=92

Suc

cess

Mee

ting

RTO

and

RP

O

Extent That Service Management Processes Account for Disasters

High

Low High

Not IntegratedDRP Integrated

High

Low

Page 10: Create a Right-Sized Disaster Recovery Plan

Info-Tech Research Group 10Info-Tech Research Group 10

Myth #4: I use a co-lo so I don’t have to worry about DR. That’s my vendor’s responsibility.Reality: You can’t assume your co-lo’s DR capability meets your needs or that DR services are part of your agreement. The same is true for cloud vendors.Using a co-lo can provide several improvements to your DR and service continuity capability. For example, the co-lo is more likely to have the following than an in-house (on-premise) data center:

• Redundant telecommunication lines and network infrastructure.

• Redundant power feeds and standby power.

• Multiple locations that could provide you with a DR site.

However, it’s your responsibility to ensure the vendor meets your DR requirements and that you have an agreement in place for a disaster recovery scenario. Considerations include:

• Does your agreement include the use of the vendor’s alternate sites in a DR scenario?

• What is the vendor’s RTO for failing over to an alternate site?

• What is the cost of leveraging the vendor’s DR services?

Evaluating a co-lo as a primary or DR site needs to be an extensive and thorough process to ensure your requirements are met, typically a minimum of three months of planning and due diligence. See Info-Tech’s blueprint, Develop a Co-location Strategy.

Page 11: Create a Right-Sized Disaster Recovery Plan

Info-Tech Research Group 11Info-Tech Research Group 11

Myth #5: A DRP must be detailed enough that anyone can execute the recovery.

Reality: DR is not like an airplane disaster movie. You aren’t going to ask a business user to execute a system recovery, just like you wouldn’t really want a passenger with no flying experience to land a plane.Keeping in mind your audience – knowledgeable IT staff – you can take a more visual and concise approach to documentation, which ultimately makes it more usable, easier to maintain, and therefore more effective as shown in the chart.

Note that DR success scores are based on:• Meeting Recovery Time Objectives (RTOs).• Meeting Recovery Point Objectives (RPOs).• IT staff’s confidence in their ability to meet

RTOs/RPOs.

Primarily flowcharts, checklists, and

diagrams

Traditional manual

Source: Info-Tech Research Group; N=95

Low

High

DR

Suc

cess

Choose flowcharts over process guides, checklists over procedures, and diagrams over descriptions.

Without question, 120-page DRPs are not effective. I mean, auditors love them because of the detail, but give me a 10-page DRP with contact lists, process flows, diagrams, and recovery checklists that are easy to follow.

– Bernard Jones, MBCI, CBCP, CORP, Manager Disaster Recovery/BCP, ActiveHealth Management

Page 12: Create a Right-Sized Disaster Recovery Plan

Info-Tech Research Group 12Info-Tech Research Group 12

Summary of Info-Tech’s approach to DRP

Traditional Approach Info-Tech’s Approach

Start with extensive risk and probability analysis.

Challenge: You can’t predict every event that can occur, and this delays work on your actual recovery procedures.

Focus on how to recover regardless of the incident. We know failure will happen, so focus on building out your ability to failover to a DR environment so you are protected regardless of what causes your primary site to fail.

Build a plan for major events such as natural disasters.

Challenge: When looking at the causes of unacceptable downtime, major destructive events only account for 12% of incidents while software/hardware issues account for 45%. The vast majority of incidents are isolated local events.

An effective DRP improves day-to-day service continuity, and is not just for major events. Leverage DR planning to also address the more common incidents (e.g. power/network outage or hardware failure) as well as major events. It has to be a plan you can use, not just sit on a shelf.

Create a DRP manual that provides step-by-step instructions that anyone could follow.

Challenge: The result is lengthy, dense manuals that are difficult to maintain and not very usable in a crisis. The usability of DR documents have a direct impact on DR success.

Create concise documentation aimed at your DR team. Use flowcharts, checklists, and diagrams – they are quicker to create, more usable in a crisis, and easier to maintain. Remember your audience – you aren’t going to ask a business user to recover your ERP, so you can afford to be concise.

DR planning is not your full-time job, so it can’t be a resource- and time-intensive process. You need a practical approach that creates a more-concise and effective DRP.

Page 13: Create a Right-Sized Disaster Recovery Plan

Info-Tech Research Group 13Info-Tech Research Group 13

An effective DRP relies on identifying and providing appropriate DR capability for each application

CASE STUDY Industry Manufacturing

Situation Solution Results

• A global manufacturer with annual sales of over $1 billion was looking to improve its DR capabilities.

• Info-Tech Research Group conducted an achievable RTO and RPO analysis and identified metrics for email and ERP.

o Email: Q̶ Achievable RTO:

Near zeroQ̶ Achievable RPO:

Near zeroo ERP:

Q̶ Achievable RTO: 14 hours

Q̶ Achievable RPO: 24 hours

• As part of a current state assessment, the firm went through a BIA analysis.

• The BIA analysis indicated the following downtime impacts for email and ERP.

o Email:Q̶ Financial Impact:

$100,000/24 hoursQ̶ Goodwill Impact:

8.5/16o ERP:

Q̶ Financial Impact: $1,350,000/24 hours

Q̶ Goodwill Impact: 12.5/16

• Despite the importance placed on email, downtime has a relatively low impact on the business.

o No revenue impact.o Productivity impact was

only a disruption on the normal routine and not necessarily on key business processes.

• Downtime for ERP had a tremendous business impact. However, it was not given appropriate DR capabilities.

• Following Info-Tech’s workshop, it was clear that the firm needed to reprioritize its applications and provide additional support for the more critical applications.

Page 14: Create a Right-Sized Disaster Recovery Plan

Info-Tech Research Group 14Info-Tech Research Group 14

Consulting

“Our team does not have the time or the

knowledge to take this project on. We need

assistance through the entirety of this project.”

Guided Implementation

“Our team knows that we need to fix a

process, but we need assistance to

determine where to focus. Some check-ins along the way would

help keep us on track.”

DIY Toolkit

“Our team has already made this critical

project a priority, and we have the time and capability, but some guidance along the

way would be helpful.”

Workshop

“We need to hit the ground running and

get this project kicked off immediately. Our

team has the ability to take this over once we get a framework and

strategy in place.”

Diagnostics and consistent frameworks used throughout all four options

Info-Tech offers various levels of support to best suit your needs

Page 15: Create a Right-Sized Disaster Recovery Plan

Info-Tech Research Group 15Info-Tech Research Group 15

Measured value for Guided Implementations (GIs)

Engaging in GIs doesn’t just offer valuable project advice, it also results in significant cost savings. GI Measured Value

Phase 1: Define parameters for your DRP

• Time, value, and resources saved by leveraging Info-Tech’s methodology to define the scope of your DRP project.

• For example, 2 FTEs * 5 days * $80,000/year = $3,200

Phase 2: Determine the desired recovery timeline

• Time, value, and resources saved by using Info-Tech’s tools and templates to establish and document recovery objectives.

• For example, 2 FTEs * 5 days * $80,000/year = $3,200

Phase 3: Determine the current recovery timeline and DR gaps

• Time, value, and resources saved by following Info-Tech’s tools and methodology to document recovery timelines and incident response plans.

• For example, 2 FTEs * 5 days * $80,000/year = $3,200

Phase 4: Create a project roadmap to close DR gaps

• Time, value, and resources saved by following Info-Tech’s best-practice guidance and templates to establish an effective project roadmap to close DR gaps.

• For example, 2 FTEs * 4 days * $80,000/year = $2,560

Total savings $10,240

Page 16: Create a Right-Sized Disaster Recovery Plan

Info-Tech Research Group 16Info-Tech Research Group 16

Best-Practice Toolkit

1.1 Create a DRP pilot project charter1.2 Measure current DRP maturity1.3 Identify key applications and dependencies

2.1 Define an impact scoring scale2.2 Estimate impact of downtime and assign criticality2.3 Identify desired recovery timeline

3.1 Identify current capabilities via tabletop planning3.2 Determine the RTO/RPO gaps3.3 Identify additional risks

4.1 Create a project roadmap to close recovery gaps 4.2 Create incident response plans4.3 Summarize DRP results and review the DRP roadmap

Guided Implementations

Leverage Info-Tech’s DRP Project Charter Template to clarify expectations.

Determine current DRP gaps through a DRP maturity scorecard.

Document key applications and dependencies.

Define an objective scoring scale to indicate different levels of impact.

Define the criticality of each application.

Identify desired RTOs and RPOs based on business impact.

Conduct a tabletop planning exercise based on current capabilities.

Analyze RTO and RPO gaps.

Conduct a high-level risk assessment to identify additional vulnerabilities.

Prioritize each project and establish a project roadmap.

Conduct a tabletop planning exercise to define the desired state.

Complete the DRP and review the DRP roadmap.

Onsite Workshop

Module 1:Identify key applications, dependencies, and DR challenges.

Module 2:Determine the desired recovery timeline.

Module 3:Document the current recovery timeline and DR gaps.

Modules 4 and 5:• Create a project roadmap

to close DR gaps.• Establish a framework for

DRP documentation.

Phase 1 Results:Complete a DRP maturity scorecard and identify key applications and dependencies.

Phase 2 Results:Create an objective scoring scale and determine the desired recovery timeline.

Phase 3 Results:Document the current incident response plan and determine RTO/RPO gaps.

Phase 4 Results:• Summarize results and

projects to close DR gaps.• A practical approach to

documenting your DRP.

1. Define parameters for your DRP

2. Determine the desired recovery timeline

3. Determine the current recovery timeline and DR

gaps

4. Create a project roadmap to close DR

gaps

Create a Right-Sized Disaster Recovery Plan – project overview

Page 17: Create a Right-Sized Disaster Recovery Plan

Info-Tech Research Group 17Info-Tech Research Group 17

Contact your account representative or email [email protected] for more information.

Workshop Prep Workshop Day 1 Workshop Day 2 Workshop Day 3 Workshop Day 4

Activities

Create a DRP pilot project charter

• Create a DRP pilot team and define roles and responsibilities.

• Establish parameters for the project, including objectives, deliverables, and scope.

Define parameters for your DRP

1.1 Assess current DR maturity.

1.2 Determine critical business operations.

1.3 Identify key applications and dependencies.

Determine the desired recovery timeline

2.1 Define an objective scoring scale to indicate different levels of impact.

2.2 Estimate the impact of downtime.

2.3 Determine desired RTO/RPO targets for applications based on business impact.

Determine the current recovery timeline and

DR gaps

3.1 Conduct a tabletop exercise to determine current recovery procedures.

3.2 Identify gaps between current and desired capabilities.

3.3 Determine what projects are required to close the gap between current and desired DR capability.

Create a project roadmap to close DR gaps

4.1 Use tabletop planning to determine the desired-state response plan.

4.2 Outline a strategy for using flowcharts, checklists, and a summary document to complete your DRP.

4.3 Summarize the workshop results, including current potential downtime and action items to close gaps.

Deliverables

1. DRP Project Charter Template

2. DRP Workbook

1. DRP Business Impact Analysis Tool

2. DRP Maturity Scorecard

1. DRP Business Impact Analysis Tool

2. DRP Vendor Evaluation Questionnaire and Tool

1. DRP Business Impact Analysis Tool

2. Incident Response Flowchart – Current State

3. DRP Project Roadmap Tool

1. Incident Response Flowchart – Desired State

2. Executive Communication Deck

3. DRP templates and how to complete them

Workshop overview

Page 18: Create a Right-Sized Disaster Recovery Plan

Info-Tech Research Group 18Info-Tech Research Group 18

Use these icons to help direct you as you navigate this research

This icon denotes a slide where a supporting Info-Tech tool or template will help you perform the activity or step associated with the slide. Refer to the supporting tool or template to get the best results and proceed to the next step of the project.

This icon denotes a slide with an associated activity. The activity can be performed either as part of your project or with the support of Info-Tech team members, who will come onsite to facilitate a workshop for your organization.

Use these icons to help guide you through each step of the blueprint and direct you to content related to the recommended activities.