Disaster Recovery Best Practices: Testing tips & maximizing your...
Transcript of Disaster Recovery Best Practices: Testing tips & maximizing your...
E-Guide
Disaster Recovery Best Practices:
Testing tips & maximizing your DR
budget
Overall, disaster recovery tests are essential to execute and
demonstrate, but you have to be cautious and take the correct steps
to test your DR plans. Otherwise, your plan might fail you in any given
disaster recovery situation. This expert E-Guide can help minimize the
risk of your plan failing by discussing different DR testing tips. Also
outlined – how to make the most out of your DR budget.
Sponsored By:
SearchDisasterRecovery.com E-Guide
Disaster Recovery Best Practices: Testing tips & maximizing your DR budget
Sponsored By: Page 2 of 12
E-Guide
Disaster Recovery Best Practices:
Testing tips & maximizing your DR
budget
Table of Contents
Disaster recovery plan testing primer: Test to fail
Making the most out of your disaster recovery budgets
Resources from Iron Mountain
SearchDisasterRecovery.com E-Guide
Disaster Recovery Best Practices: Testing tips & maximizing your DR budget
Sponsored By: Page 3 of 12
Disaster recovery plan testing primer: Test to fail
According to many standards institutions and organizations that focus on disaster recovery
(DR) and business continuity (BC), disaster recovery plan testing will often result in the
continued success and operations of a business, even in times of a disaster.
For example:
An organization's business continuity and incident management arrangements cannot be
considered reliable until exercised and unless their currency is maintained. -- BS 25999
(British Standards Institution [BSI])
Business continuity plans should be tested and updated regularly to ensure that they are up
to date and effective. -- ISO 27002 (International Organization for Standardization)
The entity shall evaluate program plans, procedures, and capabilities through periodic
reviews, testing, and exercises. -- NFPA 1600 (Standard for Disaster/ Emergency
Management and Business Continuity)
So if everyone agrees that testing of business continuity/disaster recovery plans is a
genuine, certified good thing, then there's nothing to argue about here, right? I, however,
have reason to disagree with the claimed success of disaster recovery testing. I've seen too
many examples of DR plans that have been tested routinely over extended periods of time,
but still fail when needed.
Some of the problems that arise with the assurance of DR testing have to do with
definitions. A quick glance at the statements extracted from the best-known business
continuity management standards shows that the words test, exercise, review and rehearse
are used in an overlapping manner, if not interchangeably. Some definitions include "testing
equipment" and "exercising people," but these terms can be confusing, and moreover, a lot
of the disaster recovery tests may not be carried out correctly due to human error. People
run equipment and often make unnecessary and unexpected mistakes under pressure. And
who today can get meaningful work done without the necessary equipment?
SearchDisasterRecovery.com E-Guide
Disaster Recovery Best Practices: Testing tips & maximizing your DR budget
Sponsored By: Page 4 of 12
Demonstrations, not disaster recovery plan tests
I have seen entirely too many companies sign up at their commercial recovery service for
their allotted 48 hours of test time and force a small coterie of specialists through two days
of hell so they could return home and announce that everything went well again this year.
They were not testing; they were demonstrating. They were showing that a limited team of
well-trained individuals can perform tasks very much like their routine jobs at a distant
location that has become familiar to them over time.
Now, there is some value to a demonstration. It allows management to reassure regulators
that they are doing what is expected of them, and it makes auditors happy. But it does not
validate that a set of procedures would be effective if carried out without key personnel,
without advance planning and without the pressure of an actual emergency. To use a sports
analogy, this sort of "testing" is practice, admittedly a necessity for success at game time.
But it is not at all the same thing as playing for keeps.
Finding defects in disaster recovery plan testing
Successful tests do not prove that a disaster recovery plan will succeed, but failed tests do
prove that plan will fail. And that is what makes testing so important.
Business continuity plans and disaster recovery plans are engineered products constructed
by fallible human beings. Like all engineered products, they have defects, many of which go
unnoticed for a very long period of time until a certain set of circumstances align to show
the flaw. Most often, if a disaster recovery plan is going to fail, it will most likely happen
during a disaster. Therefore, if a test detects a defect under relatively ideal conditions, it
enables enhancements to be made before the plan is ever needed.
A disaster recovery plan is never a finished document and probably inaccurate due to the
constant erosion caused by changes to the business, technology, personnel, etc.
Maintenance to a DR plan is necessary but sometimes insufficient if flaws in the original plan
exist. Because of that, there are many maintenance activities that need to be tested to find
defects introduced by the fixes, and that cycle can go on continuously. Some recovery
processes are incredibly complex, such as ERP system, a non-standard file system or a
SearchDisasterRecovery.com E-Guide
Disaster Recovery Best Practices: Testing tips & maximizing your DR budget
Sponsored By: Page 5 of 12
multi-site integrated application. Changes to repair a flaw in one of these processes is likely
to introduce others.
Independence in testing
Tests are conducted routinely, but often are only conducted by one person, who most likely
over time has had disaster recovery testing become part of their job description and
responsibility. Often since this person creates the DR tests, only he/she understands the
mental shorthand that is written into the plan. And because this person makes the plan easy
to carry out by themselves, he/she has automatically introduced the very source of failure.
For when the plan is needed, there is no assurance that that person will still be employed,
not on vacation and not injured in the event that caused the plan to be needed.
To make sure your DR test doesn't fail, be sure to take these items into consideration:
When a disaster recovery plan is newly created, it is legitimate to
demonstrate it. There will be enough kinks to iron out that there is no additional
need to complicate the testing process. But thereafter, develop test scenarios that
are intended to simulate the chaotic reality of a disaster (e.g., a key person is not
available; a vital backup tape cannot be read, a software patch has not been applied
to the recovery version of the operating system, etc.).
Have someone other than those who are conducting the test construct the
scenario. If you know where the punches are coming from it is easier to duck. It is
just human nature to make the test easier to pass by formulating an easily soluble
case.
An independent person or group should referee every disaster recovery plan
test. It is easy to declare victory when the testers are the only ones present, but
much more difficult if there is a gimlet-eyed auditor present. However, the observant
eyes don't necessarily need to belong to auditors; anyone independent will do, such
as consultants, vendors or technical personnel from other divisions on a mutual
basis.
SearchDisasterRecovery.com E-Guide
Disaster Recovery Best Practices: Testing tips & maximizing your DR budget
Sponsored By: Page 6 of 12
To the degree that testing indicates something other than total success, any
shortcomings noted should be considered as defects in the disaster recovery
program as well as its resulting plans. Once defects are recognized and
categorized defects, resolve any problems and determine their causes. Implement
preventive and detective controls to identify and track defect recurrence and
diminution (or growth). All findings should be communicated to management.
Resolution of defects must be reflected in the testing that identified them.
The same test should be re-performed with the resolutions in place to determine if
they are effective in eliminating the defects. This may require several iterations of
testing, so waiting a year for the next test is insufficient. Be sure to document the
results of the re-testing, as well as to develop and implement testing methods to
identify possible defect recurrence.
Overall, disaster recovery tests are essential to execute and demonstrate, but be cautious
and take the correct steps to test your DR plans. Otherwise, your plan might fail you in any
given disaster recovery situation.
Your Information is at Risk.Protect What Matters Most.
As the amount of information your organization has to manage and protect continues to grow, the challenge
of managing the potential risk increases exponentially. How can you ensure your organization’s information
is not at risk? Partner with the company thousands have trusted to store, protect and manage their
information regardless of format — Iron Mountain. With unmatched experience, putting us at your side makes
information easier to manage. We can do more, together.
Safeguard your Information. Visit us at ironmountain.com.
categoRIze aRcHIVe IMage dIScoVeR deStRoY
©2011 Iron Mountain Incorporated. All rights reserved. Iron Mountain and the design of the mountain are registered trademarks of Iron Mountain Incorporated in the U.S. and other countries.
SearchDisasterRecovery.com E-Guide
Disaster Recovery Best Practices: Testing tips & maximizing your DR budget
Sponsored By: Page 8 of 12
Making the most out of your disaster recovery budgets
By Garry Kranz
Figuring out how much to spend on disaster recovery (DR) is always difficult for
organizations, but shrinking IT budgets make the problem even more acute. Despite these
challenges, for some organizations, not even a lousy economy is an excuse to cut back on
disaster recovery investments.
"Our capital budget is probably half of what it was last year, but we don't scrimp on DR
spending. We'll defer a system upgrade before we defer the capital needed to maintain our
DR capability," said Harry F. Lukens, CIO of Lehigh Valley Hospital (LVH) and Health
Network in Allentown, Pa.
The 700-bed hospital system uses a series of "hot boxes" at a secondary data center in
nearby Bethlehem. Formerly a testing and data center of IBM Corp., the facility was
purchased as part of an acquisition of another hospital about 10 years ago.
The off-site data backup servers enable 14 different critical computing systems -- including
those for operating rooms, medical/surgical, and labor/delivery -- to continue functioning in
the event of an outage. In addition, LVH has configured individual backup servers for about
40 other major systems housed at its primary data center in Allentown.
Lukens estimated that LVH spends about $540,000 annually on disaster recovery, including
capital costs of $300,000 to upgrade or replace servers. Operating expenses, including
testing and a salary for a disaster recovery coordinator are about $200,000.
The disaster recovery plan is managed mostly by the hospital's IT department. The lone
exception: two Tandem mainframe computers are outsourced to DR services provider
SunGard of Wayne, Pa. The outsourcing "insurance" costs LVH about $3,500 per month,
Lukens said.
SearchDisasterRecovery.com E-Guide
Disaster Recovery Best Practices: Testing tips & maximizing your DR budget
Sponsored By: Page 9 of 12
How much disaster recovery spending is too much?
"Supporting disk-to-tape backup, in which you need to recover within a few days, is going
to cost you less than having a dedicated disk-to-disk infrastructure that lets you recover in a
matter of hours," said John Morency, a research director with Stamford, Conn.-based
Gartner Inc.
Citing Gartner's research during the past several years, Morency said small- to midsized
businesses (SMBs) devote anywhere from eight-tenths of 1% to 2.8% of their IT budgets to
disaster recovery tools, training and services. The amount of disaster recovery spending is
affected by an organization's infrastructure, configuration and management needs.
"The key point is to align your DR investments to ensure you have a reasonable balance
between risk mitigation and affordability," Morency said.
Business impact analysis and risk in DR planning
Experts say it's difficult to forecast costs unless you identify the threats, their probability
and their financial impact on your business. That process is known as a business impact
analysis (BIA). It provides information that helps to pinpoint which business processes and
applications are at risk and, more importantly, how quickly they need to be restored.
"The point at which you need to recover your data has a huge impact on costs and how to
budget. In general, the longer you extend your recovery time, the lower your cost of
recovery is going to be," said Larry Arker, a risk-management consultant with Jefferson
Wells in Milwaukee.
Trying to anticipate and prevent any inconvenience at all is "exactly the wrong approach,"
said Richard Jones, vice president for data center strategies at Burton Group, a consulting
firm in Midvale, Utah.
Several years ago, a manufacturing company in the Ohio Basin made a crucial decision:
Don't worry about every application or process. Instead, Jones said the company
SearchDisasterRecovery.com E-Guide
Disaster Recovery Best Practices: Testing tips & maximizing your DR budget
Sponsored By: Page 10 of 12
determined that only two out of hundreds of business applications needed to be recovered
within one day. Most applications were protected using inexpensive tape backup.
On the flip side, Jones said Wall Street firms stand to lose millions of dollars per broker for
each minute a system is down. Therefore, they may dedicate 75% to 80% of IT budgets on
disaster recovery.
"Understanding threats and probabilities gives you insight into how much money you risk
losing, and how much you're going to have to spend to maintain the business," Jones said.
When preparing his annual disaster recovery budget, Lukens requires each of his server
directors to provide an itemized list of hardware that will need to be replaced in the
upcoming year. The "bottom-up-driven budget" ensures the wisest use of disaster recovery
dollars, he said.
"You can't just say, 'Here's a bunch of money, go make it happen.' Because you may be
spending too much or you may be spending too little," Lukens said.
Organizations make several overspending mistakes including trying to provide total or near-
total redundancy, when lower-cost alternatives would suffice. Besides overspending, for
companies using lower-cost tape media, rising energy prices are forcing them to pay higher
rates for transporting backup tapes from their archival provider (such as Iron Mountain Inc.
or Seagate Technology's i365) to testing sites. "It's not unusual to see rates of $5,000 to
$6,000 per [archival company] truck roll. This number adds up fast when you have lots of
tapes that are needed for applications and data restoration," Morency said.
Reexamine disaster recovery spending priorities
In order to prioritize spending, companies should use the slackened pace of business to
decide if they are making the most of their disaster recovery budget. For example, look into
whether or not you can reallocate costly storage or replication hardware to high-priority
applications and shift other applications to less-costly tape backup.
SearchDisasterRecovery.com E-Guide
Disaster Recovery Best Practices: Testing tips & maximizing your DR budget
Sponsored By: Page 11 of 12
It's a good time for companies to scrutinize their disaster recovery plans a little better to try
and squeeze more cost savings from it," Jones said.
Scale back on DR tests
Disaster recovery tests are costly and time-consuming, so it's important for an organization
to know and test only what is necessary. Email, enterprise resource planning systems,
supply-chain networks, payment and payroll, intranets/extranets, and customer-facing
websites are typical applications that most organizations will want to test routinely.
"If an organization doesn't do this type of analysis, then the implicit expectation is going to
be that IT can recover everything, which is totally unrealistic" in most cases, Morency said.
Collocation and disaster recovery
Until the economy rebounds, few companies are willing to incur the huge capital cost
associated with building new data centers. That includes postponing expansions of existing
data centers to accommodate new applications. Meanwhile, companies are opting to use
collocation or hosting providers such as Hewlett Packard (HP) Co., IBM Corp. and SunGard
as a "tactical cost-saving step" to support backup and recovery, Morency said. Some
companies are taking a blended approach, divvying up disaster recovery dollars to both
expand their DR architecture while outsourcing secondary and tertiary data tiers to outside
providers.
Also, as those disaster recovery contracts come up for annual renewal, an organization may
be able to reduce costs by reconfiguring its environment or reducing the number of hot sites
needed.
If you have a smaller IT staff that's being asked to do even more, make sure they have the
appropriate level of training. Because of budget cuts, Lehigh Valley Hospital has 7% fewer
IT staff this year, but disaster recovery requirements aren't slackening. "To make sure we're
covering all our DR stuff, we're having to cross-train people (on different servers) now more
than we ever did in the past," Lukens said.
SearchDisasterRecovery.com E-Guide
Disaster Recovery Best Practices: Testing tips & maximizing your DR budget
Sponsored By: Page 12 of 12
Resources from Iron Mountain
Compliant Media Management: Best Practices Guide
Guide to Improving Your Tape Storage Practices
Offsite Tape Vaulting Brochure: Secure Media Management
About Iron Mountain
Iron Mountain is a world leader in information management services, assisting more than
140,000 organizations in 39 countries on five continents with storing, protecting and
managing their information.
Publicly traded under NYSE symbol IRM, Iron Mountain is a S&P 500 company and a
member of the Fortune 1000 (currently ranked: 643). Organizations in every major industry
and of all sizes—including more than 97% of the Fortune 1000—rely on Iron Mountain as
their information management partner.
We’re proud that our customers have put their trust with us. We safely store some of the
world’s most valuable historical artifacts, cultural treasures, business documents and
medical records. To properly protect and render this information, Iron Mountain employs
almost 20,000 professionals and boasts an unrivaled infrastructure that includes more than
1,000 facilities, 10 data centers and 3,500 vehicles.