Disaster Recovery Best Practices: Testing tips & maximizing your...

12
E-Guide Disaster Recovery Best Practices: Testing tips & maximizing your DR budget Overall, disaster recovery tests are essential to execute and demonstrate, but you have to be cautious and take the correct steps to test your DR plans. Otherwise, your plan might fail you in any given disaster recovery situation. This expert E-Guide can help minimize the risk of your plan failing by discussing different DR testing tips. Also outlined how to make the most out of your DR budget. Sponsored By:

Transcript of Disaster Recovery Best Practices: Testing tips & maximizing your...

Page 1: Disaster Recovery Best Practices: Testing tips & maximizing your …docs.media.bitpipe.com/io_10x/io_101875/item_457810... · 2011-09-21 · SearchDisasterRecovery.com E-Guide Disaster

E-Guide

Disaster Recovery Best Practices:

Testing tips & maximizing your DR

budget

Overall, disaster recovery tests are essential to execute and

demonstrate, but you have to be cautious and take the correct steps

to test your DR plans. Otherwise, your plan might fail you in any given

disaster recovery situation. This expert E-Guide can help minimize the

risk of your plan failing by discussing different DR testing tips. Also

outlined – how to make the most out of your DR budget.

Sponsored By:

Page 2: Disaster Recovery Best Practices: Testing tips & maximizing your …docs.media.bitpipe.com/io_10x/io_101875/item_457810... · 2011-09-21 · SearchDisasterRecovery.com E-Guide Disaster

SearchDisasterRecovery.com E-Guide

Disaster Recovery Best Practices: Testing tips & maximizing your DR budget

Sponsored By: Page 2 of 12

E-Guide

Disaster Recovery Best Practices:

Testing tips & maximizing your DR

budget

Table of Contents

Disaster recovery plan testing primer: Test to fail

Making the most out of your disaster recovery budgets

Resources from Iron Mountain

Page 3: Disaster Recovery Best Practices: Testing tips & maximizing your …docs.media.bitpipe.com/io_10x/io_101875/item_457810... · 2011-09-21 · SearchDisasterRecovery.com E-Guide Disaster

SearchDisasterRecovery.com E-Guide

Disaster Recovery Best Practices: Testing tips & maximizing your DR budget

Sponsored By: Page 3 of 12

Disaster recovery plan testing primer: Test to fail

According to many standards institutions and organizations that focus on disaster recovery

(DR) and business continuity (BC), disaster recovery plan testing will often result in the

continued success and operations of a business, even in times of a disaster.

For example:

An organization's business continuity and incident management arrangements cannot be

considered reliable until exercised and unless their currency is maintained. -- BS 25999

(British Standards Institution [BSI])

Business continuity plans should be tested and updated regularly to ensure that they are up

to date and effective. -- ISO 27002 (International Organization for Standardization)

The entity shall evaluate program plans, procedures, and capabilities through periodic

reviews, testing, and exercises. -- NFPA 1600 (Standard for Disaster/ Emergency

Management and Business Continuity)

So if everyone agrees that testing of business continuity/disaster recovery plans is a

genuine, certified good thing, then there's nothing to argue about here, right? I, however,

have reason to disagree with the claimed success of disaster recovery testing. I've seen too

many examples of DR plans that have been tested routinely over extended periods of time,

but still fail when needed.

Some of the problems that arise with the assurance of DR testing have to do with

definitions. A quick glance at the statements extracted from the best-known business

continuity management standards shows that the words test, exercise, review and rehearse

are used in an overlapping manner, if not interchangeably. Some definitions include "testing

equipment" and "exercising people," but these terms can be confusing, and moreover, a lot

of the disaster recovery tests may not be carried out correctly due to human error. People

run equipment and often make unnecessary and unexpected mistakes under pressure. And

who today can get meaningful work done without the necessary equipment?

Page 4: Disaster Recovery Best Practices: Testing tips & maximizing your …docs.media.bitpipe.com/io_10x/io_101875/item_457810... · 2011-09-21 · SearchDisasterRecovery.com E-Guide Disaster

SearchDisasterRecovery.com E-Guide

Disaster Recovery Best Practices: Testing tips & maximizing your DR budget

Sponsored By: Page 4 of 12

Demonstrations, not disaster recovery plan tests

I have seen entirely too many companies sign up at their commercial recovery service for

their allotted 48 hours of test time and force a small coterie of specialists through two days

of hell so they could return home and announce that everything went well again this year.

They were not testing; they were demonstrating. They were showing that a limited team of

well-trained individuals can perform tasks very much like their routine jobs at a distant

location that has become familiar to them over time.

Now, there is some value to a demonstration. It allows management to reassure regulators

that they are doing what is expected of them, and it makes auditors happy. But it does not

validate that a set of procedures would be effective if carried out without key personnel,

without advance planning and without the pressure of an actual emergency. To use a sports

analogy, this sort of "testing" is practice, admittedly a necessity for success at game time.

But it is not at all the same thing as playing for keeps.

Finding defects in disaster recovery plan testing

Successful tests do not prove that a disaster recovery plan will succeed, but failed tests do

prove that plan will fail. And that is what makes testing so important.

Business continuity plans and disaster recovery plans are engineered products constructed

by fallible human beings. Like all engineered products, they have defects, many of which go

unnoticed for a very long period of time until a certain set of circumstances align to show

the flaw. Most often, if a disaster recovery plan is going to fail, it will most likely happen

during a disaster. Therefore, if a test detects a defect under relatively ideal conditions, it

enables enhancements to be made before the plan is ever needed.

A disaster recovery plan is never a finished document and probably inaccurate due to the

constant erosion caused by changes to the business, technology, personnel, etc.

Maintenance to a DR plan is necessary but sometimes insufficient if flaws in the original plan

exist. Because of that, there are many maintenance activities that need to be tested to find

defects introduced by the fixes, and that cycle can go on continuously. Some recovery

processes are incredibly complex, such as ERP system, a non-standard file system or a

Page 5: Disaster Recovery Best Practices: Testing tips & maximizing your …docs.media.bitpipe.com/io_10x/io_101875/item_457810... · 2011-09-21 · SearchDisasterRecovery.com E-Guide Disaster

SearchDisasterRecovery.com E-Guide

Disaster Recovery Best Practices: Testing tips & maximizing your DR budget

Sponsored By: Page 5 of 12

multi-site integrated application. Changes to repair a flaw in one of these processes is likely

to introduce others.

Independence in testing

Tests are conducted routinely, but often are only conducted by one person, who most likely

over time has had disaster recovery testing become part of their job description and

responsibility. Often since this person creates the DR tests, only he/she understands the

mental shorthand that is written into the plan. And because this person makes the plan easy

to carry out by themselves, he/she has automatically introduced the very source of failure.

For when the plan is needed, there is no assurance that that person will still be employed,

not on vacation and not injured in the event that caused the plan to be needed.

To make sure your DR test doesn't fail, be sure to take these items into consideration:

When a disaster recovery plan is newly created, it is legitimate to

demonstrate it. There will be enough kinks to iron out that there is no additional

need to complicate the testing process. But thereafter, develop test scenarios that

are intended to simulate the chaotic reality of a disaster (e.g., a key person is not

available; a vital backup tape cannot be read, a software patch has not been applied

to the recovery version of the operating system, etc.).

Have someone other than those who are conducting the test construct the

scenario. If you know where the punches are coming from it is easier to duck. It is

just human nature to make the test easier to pass by formulating an easily soluble

case.

An independent person or group should referee every disaster recovery plan

test. It is easy to declare victory when the testers are the only ones present, but

much more difficult if there is a gimlet-eyed auditor present. However, the observant

eyes don't necessarily need to belong to auditors; anyone independent will do, such

as consultants, vendors or technical personnel from other divisions on a mutual

basis.

Page 6: Disaster Recovery Best Practices: Testing tips & maximizing your …docs.media.bitpipe.com/io_10x/io_101875/item_457810... · 2011-09-21 · SearchDisasterRecovery.com E-Guide Disaster

SearchDisasterRecovery.com E-Guide

Disaster Recovery Best Practices: Testing tips & maximizing your DR budget

Sponsored By: Page 6 of 12

To the degree that testing indicates something other than total success, any

shortcomings noted should be considered as defects in the disaster recovery

program as well as its resulting plans. Once defects are recognized and

categorized defects, resolve any problems and determine their causes. Implement

preventive and detective controls to identify and track defect recurrence and

diminution (or growth). All findings should be communicated to management.

Resolution of defects must be reflected in the testing that identified them.

The same test should be re-performed with the resolutions in place to determine if

they are effective in eliminating the defects. This may require several iterations of

testing, so waiting a year for the next test is insufficient. Be sure to document the

results of the re-testing, as well as to develop and implement testing methods to

identify possible defect recurrence.

Overall, disaster recovery tests are essential to execute and demonstrate, but be cautious

and take the correct steps to test your DR plans. Otherwise, your plan might fail you in any

given disaster recovery situation.

Page 7: Disaster Recovery Best Practices: Testing tips & maximizing your …docs.media.bitpipe.com/io_10x/io_101875/item_457810... · 2011-09-21 · SearchDisasterRecovery.com E-Guide Disaster

Your Information is at Risk.Protect What Matters Most.

As the amount of information your organization has to manage and protect continues to grow, the challenge

of managing the potential risk increases exponentially. How can you ensure your organization’s information

is not at risk? Partner with the company thousands have trusted to store, protect and manage their

information regardless of format — Iron Mountain. With unmatched experience, putting us at your side makes

information easier to manage. We can do more, together.

Safeguard your Information. Visit us at ironmountain.com.

categoRIze aRcHIVe IMage dIScoVeR deStRoY

©2011 Iron Mountain Incorporated. All rights reserved. Iron Mountain and the design of the mountain are registered trademarks of Iron Mountain Incorporated in the U.S. and other countries.

Page 8: Disaster Recovery Best Practices: Testing tips & maximizing your …docs.media.bitpipe.com/io_10x/io_101875/item_457810... · 2011-09-21 · SearchDisasterRecovery.com E-Guide Disaster

SearchDisasterRecovery.com E-Guide

Disaster Recovery Best Practices: Testing tips & maximizing your DR budget

Sponsored By: Page 8 of 12

Making the most out of your disaster recovery budgets

By Garry Kranz

Figuring out how much to spend on disaster recovery (DR) is always difficult for

organizations, but shrinking IT budgets make the problem even more acute. Despite these

challenges, for some organizations, not even a lousy economy is an excuse to cut back on

disaster recovery investments.

"Our capital budget is probably half of what it was last year, but we don't scrimp on DR

spending. We'll defer a system upgrade before we defer the capital needed to maintain our

DR capability," said Harry F. Lukens, CIO of Lehigh Valley Hospital (LVH) and Health

Network in Allentown, Pa.

The 700-bed hospital system uses a series of "hot boxes" at a secondary data center in

nearby Bethlehem. Formerly a testing and data center of IBM Corp., the facility was

purchased as part of an acquisition of another hospital about 10 years ago.

The off-site data backup servers enable 14 different critical computing systems -- including

those for operating rooms, medical/surgical, and labor/delivery -- to continue functioning in

the event of an outage. In addition, LVH has configured individual backup servers for about

40 other major systems housed at its primary data center in Allentown.

Lukens estimated that LVH spends about $540,000 annually on disaster recovery, including

capital costs of $300,000 to upgrade or replace servers. Operating expenses, including

testing and a salary for a disaster recovery coordinator are about $200,000.

The disaster recovery plan is managed mostly by the hospital's IT department. The lone

exception: two Tandem mainframe computers are outsourced to DR services provider

SunGard of Wayne, Pa. The outsourcing "insurance" costs LVH about $3,500 per month,

Lukens said.

Page 9: Disaster Recovery Best Practices: Testing tips & maximizing your …docs.media.bitpipe.com/io_10x/io_101875/item_457810... · 2011-09-21 · SearchDisasterRecovery.com E-Guide Disaster

SearchDisasterRecovery.com E-Guide

Disaster Recovery Best Practices: Testing tips & maximizing your DR budget

Sponsored By: Page 9 of 12

How much disaster recovery spending is too much?

"Supporting disk-to-tape backup, in which you need to recover within a few days, is going

to cost you less than having a dedicated disk-to-disk infrastructure that lets you recover in a

matter of hours," said John Morency, a research director with Stamford, Conn.-based

Gartner Inc.

Citing Gartner's research during the past several years, Morency said small- to midsized

businesses (SMBs) devote anywhere from eight-tenths of 1% to 2.8% of their IT budgets to

disaster recovery tools, training and services. The amount of disaster recovery spending is

affected by an organization's infrastructure, configuration and management needs.

"The key point is to align your DR investments to ensure you have a reasonable balance

between risk mitigation and affordability," Morency said.

Business impact analysis and risk in DR planning

Experts say it's difficult to forecast costs unless you identify the threats, their probability

and their financial impact on your business. That process is known as a business impact

analysis (BIA). It provides information that helps to pinpoint which business processes and

applications are at risk and, more importantly, how quickly they need to be restored.

"The point at which you need to recover your data has a huge impact on costs and how to

budget. In general, the longer you extend your recovery time, the lower your cost of

recovery is going to be," said Larry Arker, a risk-management consultant with Jefferson

Wells in Milwaukee.

Trying to anticipate and prevent any inconvenience at all is "exactly the wrong approach,"

said Richard Jones, vice president for data center strategies at Burton Group, a consulting

firm in Midvale, Utah.

Several years ago, a manufacturing company in the Ohio Basin made a crucial decision:

Don't worry about every application or process. Instead, Jones said the company

Page 10: Disaster Recovery Best Practices: Testing tips & maximizing your …docs.media.bitpipe.com/io_10x/io_101875/item_457810... · 2011-09-21 · SearchDisasterRecovery.com E-Guide Disaster

SearchDisasterRecovery.com E-Guide

Disaster Recovery Best Practices: Testing tips & maximizing your DR budget

Sponsored By: Page 10 of 12

determined that only two out of hundreds of business applications needed to be recovered

within one day. Most applications were protected using inexpensive tape backup.

On the flip side, Jones said Wall Street firms stand to lose millions of dollars per broker for

each minute a system is down. Therefore, they may dedicate 75% to 80% of IT budgets on

disaster recovery.

"Understanding threats and probabilities gives you insight into how much money you risk

losing, and how much you're going to have to spend to maintain the business," Jones said.

When preparing his annual disaster recovery budget, Lukens requires each of his server

directors to provide an itemized list of hardware that will need to be replaced in the

upcoming year. The "bottom-up-driven budget" ensures the wisest use of disaster recovery

dollars, he said.

"You can't just say, 'Here's a bunch of money, go make it happen.' Because you may be

spending too much or you may be spending too little," Lukens said.

Organizations make several overspending mistakes including trying to provide total or near-

total redundancy, when lower-cost alternatives would suffice. Besides overspending, for

companies using lower-cost tape media, rising energy prices are forcing them to pay higher

rates for transporting backup tapes from their archival provider (such as Iron Mountain Inc.

or Seagate Technology's i365) to testing sites. "It's not unusual to see rates of $5,000 to

$6,000 per [archival company] truck roll. This number adds up fast when you have lots of

tapes that are needed for applications and data restoration," Morency said.

Reexamine disaster recovery spending priorities

In order to prioritize spending, companies should use the slackened pace of business to

decide if they are making the most of their disaster recovery budget. For example, look into

whether or not you can reallocate costly storage or replication hardware to high-priority

applications and shift other applications to less-costly tape backup.

Page 11: Disaster Recovery Best Practices: Testing tips & maximizing your …docs.media.bitpipe.com/io_10x/io_101875/item_457810... · 2011-09-21 · SearchDisasterRecovery.com E-Guide Disaster

SearchDisasterRecovery.com E-Guide

Disaster Recovery Best Practices: Testing tips & maximizing your DR budget

Sponsored By: Page 11 of 12

It's a good time for companies to scrutinize their disaster recovery plans a little better to try

and squeeze more cost savings from it," Jones said.

Scale back on DR tests

Disaster recovery tests are costly and time-consuming, so it's important for an organization

to know and test only what is necessary. Email, enterprise resource planning systems,

supply-chain networks, payment and payroll, intranets/extranets, and customer-facing

websites are typical applications that most organizations will want to test routinely.

"If an organization doesn't do this type of analysis, then the implicit expectation is going to

be that IT can recover everything, which is totally unrealistic" in most cases, Morency said.

Collocation and disaster recovery

Until the economy rebounds, few companies are willing to incur the huge capital cost

associated with building new data centers. That includes postponing expansions of existing

data centers to accommodate new applications. Meanwhile, companies are opting to use

collocation or hosting providers such as Hewlett Packard (HP) Co., IBM Corp. and SunGard

as a "tactical cost-saving step" to support backup and recovery, Morency said. Some

companies are taking a blended approach, divvying up disaster recovery dollars to both

expand their DR architecture while outsourcing secondary and tertiary data tiers to outside

providers.

Also, as those disaster recovery contracts come up for annual renewal, an organization may

be able to reduce costs by reconfiguring its environment or reducing the number of hot sites

needed.

If you have a smaller IT staff that's being asked to do even more, make sure they have the

appropriate level of training. Because of budget cuts, Lehigh Valley Hospital has 7% fewer

IT staff this year, but disaster recovery requirements aren't slackening. "To make sure we're

covering all our DR stuff, we're having to cross-train people (on different servers) now more

than we ever did in the past," Lukens said.

Page 12: Disaster Recovery Best Practices: Testing tips & maximizing your …docs.media.bitpipe.com/io_10x/io_101875/item_457810... · 2011-09-21 · SearchDisasterRecovery.com E-Guide Disaster

SearchDisasterRecovery.com E-Guide

Disaster Recovery Best Practices: Testing tips & maximizing your DR budget

Sponsored By: Page 12 of 12

Resources from Iron Mountain

Compliant Media Management: Best Practices Guide

Guide to Improving Your Tape Storage Practices

Offsite Tape Vaulting Brochure: Secure Media Management

About Iron Mountain

Iron Mountain is a world leader in information management services, assisting more than

140,000 organizations in 39 countries on five continents with storing, protecting and

managing their information.

Publicly traded under NYSE symbol IRM, Iron Mountain is a S&P 500 company and a

member of the Fortune 1000 (currently ranked: 643). Organizations in every major industry

and of all sizes—including more than 97% of the Fortune 1000—rely on Iron Mountain as

their information management partner.

We’re proud that our customers have put their trust with us. We safely store some of the

world’s most valuable historical artifacts, cultural treasures, business documents and

medical records. To properly protect and render this information, Iron Mountain employs

almost 20,000 professionals and boasts an unrivaled infrastructure that includes more than

1,000 facilities, 10 data centers and 3,500 vehicles.