WHITE PAPER VERITAS Bare Metal Restore: Automating...

10
WHITE PAPER VERITAS Bare Metal Restore: Automating Recovery from Server Failure Sponsored by: VERITAS Bill North Eric Sheppard June 2003 EXECUTIVE SUMMARY VERITAS Bare Metal Restore (BMR) extends the functionality of the company’s NetBackup product by providing users with the ability to capture and restore system operating environments simply and automatically. Restoring system operating environments can be complex and can add to the time spent bringing an application back online. Bare Metal Restore automates the recovery process for both Windows and Unix environments. VERITAS Software commissioned IDC to survey user experiences with system failure. One hundred IT professionals directly responsible for recovery planning were interviewed. Findings from the VERITAS Server Restore Survey confirm the challenges that users face when restoring operating systems manually. All respondents had responded to server failures. On average, 17 person-hours were spent bringing systems back into operation. In most cases, more than one IT specialist was required. VERITAS Bare Metal Restore is able to re-create a server operating environment in minutes using a highly automated, two-step process. In the first step, the user initiates the recovery process for the appropriate BMR client via the BMR administrative GUI or command line. In the second step, the user boots the client either from local media or over the network. BMR will then re-create the server’s operating environment and, if desired, also restore user data. IDC believes that Bare Metal Restore represents a logical extension to VERITAS NetBackup by addressing an increasingly important need to reliably restore not just application data, but also the complex operating environments upon which applications depend. SYSTEM RECOVERY RECONSIDERED Traditional backup and recovery methods often focus exclusively on the protection of valuable corporate data assets. These assets typically are stored redundantly on different media and at different geographical locations to ensure that replicas of critical information are available and accessible in case primary systems fail. In the event of a disaster that disables one or more primary systems, copies of key datasets can be reloaded for processing on alternative systems -- or can they? VERITAS SERVER RESTORE SURVEY In late 2002, VERITAS Software commissioned IDC to investigate the user experiences restoring servers following system failures. One hundred IT professionals were interviewed, all of whom were responsible for recovery planning and had direct experience bringing failed systems back online. Global Headquarters: 5 Speen Street Framingham, MA 01701 USA P.508.872.8200 F.508.935.4015 www.idc.com

Transcript of WHITE PAPER VERITAS Bare Metal Restore: Automating...

W H I T E PA P E R

V E R I TA S B a r e M e ta l R e s t o r e : A u t o m a t i n g R e c o v e r y f r o mS e r v e r F a i l u r eSponsored by: VERITAS

Bill North Eric Sheppard

June 2003

E X E C U T I V E S U M M A R Y

VERITAS Bare Metal Restore (BMR) extends the functionality of the company'sNetBackup product by providing users with the ability to capture and restore systemoperating environments simply and automatically. Restoring system operatingenvironments can be complex and can add to the time spent bringing an applicationback online. Bare Metal Restore automates the recovery process for both Windowsand Unix environments.

VERITAS Software commissioned IDC to survey user experiences with system failure.One hundred IT professionals directly responsible for recovery planning wereinterviewed. Findings from the VERITAS Server Restore Survey confirm thechallenges that users face when restoring operating systems manually. Allrespondents had responded to server failures. On average, 17 person-hours werespent bringing systems back into operation. In most cases, more than one IT specialistwas required.

VERITAS Bare Metal Restore is able to re-create a server operating environment inminutes using a highly automated, two-step process. In the first step, the user initiatesthe recovery process for the appropriate BMR client via the BMR administrative GUI orcommand line. In the second step, the user boots the client either from local media orover the network. BMR will then re-create the server's operating environment and, ifdesired, also restore user data.

IDC believes that Bare Metal Restore represents a logical extension to VERITASNetBackup by addressing an increasingly important need to reliably restore not justapplication data, but also the complex operating environments upon which applicationsdepend.

S Y S T E M R E C O V E R Y R E C O N S I D E R E D

Traditional backup and recovery methods often focus exclusively on the protection ofvaluable corporate data assets. These assets typically are stored redundantly ondifferent media and at different geographical locations to ensure that replicas of criticalinformation are available and accessible in case primary systems fail. In the event ofa disaster that disables one or more primary systems, copies of key datasets can bereloaded for processing on alternative systems -- or can they?

V E R I T A S S E R V E R R E S T O R E S U R V E Y

In late 2002, VERITAS Software commissioned IDC to investigate the userexperiences restoring servers following system failures. One hundred IT professionalswere interviewed, all of whom were responsible for recovery planning and had directexperience bringing failed systems back online.

Glo

bal H

eadq

uarte

rs: 5

Spe

en S

treet

Fra

min

gham

, MA

0170

1 U

SAP.

508.

872.

8200

F.

508.

935.

4015

w

ww.

idc.

com

The sample was stratified by company size, with 23 representatives from smallcompanies (less than 100 employees), 41 from midsize companies (100 to 999employees), and 36 from large companies (greater than 1,000 employees).

Microsoft Windows and versions of Unix (including Linux) were the most commonserver operating environments. Of the respondents, 85% reported use of Windowsand 20% reported use of a version of Unix.

Survey participants spent approximately a half-hour discussing server-recovery issueswith IDC researchers. Results of the VERITAS Server Restore Survey provide anempirical foundation for IDC's discussion of system recovery reconsidered.

OPERATING ENVIRONMENTS

Before information access can be restored following an application or infrastructureserver failure, the operating environment that the server depends upon must be fullyrecovered. The operating environment consists of the operating system, all necessaryutilities and application software, as well as system configuration files, user directories,security systems, and network connections. Software versions, patches, andupgrades must also be restored and add even more variables to the operatingenvironment. Recreating a working server platform for application processing hasbecome an increasingly demanding task.

Operating environments can be complex and volatile. They contain an intricate mix ofboth relatively permanent and variable information. Reloading the operating systemfrom source media is an example of recovering information that makes up part of therelatively permanent portion of the operating environment. Recreating user directoriesand network connections are examples of recovering more volatile, user configurableinformation. The complexity grows as different versions, upgrades, service packages,and customizations are considered.

©2003 IDC#03C35632

IDC's survey asked respondents how many person-hours were spent recoveringfrom the last server failure. The average was 17 person-hours, with 15% ofrespondents indicating that over 24 person-hours were expended. Largercompanies reported more time spent (average 23 hours), while smallercompanies reported an average of 9 person-hours needed to recover.

VERITAS Server Restore Survey: Person-Hours to Recover

Recreating an operating environment "by hand" takes time and a broad collection ofskills. Network, database, operating system, and application software expertise will beneeded, as will accurate, up-to-date records of system configuration. Source media forcomponents to be recovered must be located, loaded, and tested. The correctcombination of patches and updates needs to be applied in the right sequence. Evenwhen copies of enterprise data are immediately available, bringing a server to thecorrect operating state to restore data access can require a great deal of humanintervention and may take days, not hours, to accomplish.

AUTOMATED SYSTEM RECOVERY

Automated system recovery depends on two complementary tasks: the ability tocapture and store all necessary operating environment components and variables andthe ability to restore or recreate those components and variables quickly andaccurately on an available server. Each server has a potentially unique configuration,and records of the configuration must be up to date. While system environmentvariables do not form large datasets, they can and do exceed the capabilities of mostad-hoc management techniques.

Manual capture of system environments has been accomplished with tools such asspreadsheets, notebooks, and installation guides. System administrators often collectbookshelves of source media for operating systems, applications, and their upgrades.Perhaps a snapshot of user directories and the authorization database have beenstored on the operator's workstation. Experience shows, however, that these ad-hocmethods do not reliably collect and update all necessary configuration information ona reliable, routine basis.

Capturing the server operating environment can be a regularly scheduled processthat, like dataset backup, stores a snapshot of the system environment often enoughto assure that the system can be restored to its most recent operational state.Alternately, the capture of the operating environment can be undertaken before

3#03C3563©2003 IDC

IDC survey results found that 84% of the respondents use manual recoverymethods. As shown in figure, this percentage was approximately the same forsmall and midsize companies and slightly lower for larger companies. Manualmethods are the dominant ways that system recovery is currently handled.

VERITAS Server Restore Survey: Use of Tools

episodes of potential risk. For example, storing the operating environment immediatelyprior to installing software upgrades will provide operators with the option of rolling theoperating environment back should the upgrade process go astray.

The capture process must accommodate today's typical datacenter, with many serverspotentially running on different operating systems, each uniquely configured to supportdifferent application and user requirements. In some cases, recovery may simplymean rolling back to an earlier operating environment on a single server. In moreextreme cases, however, datacenter staff may face the task of recreating tens ofservers, each with its own unique operating environment.

The requirements for automated system recovery mirror the capture functions. Systemadministrators need to be able to locate a particular server's operating environmentand reinstall that environment on a replacement server. Once this is completed,normal restore procedures can be used to recover all remaining files from the standardbackup server. Moreover, when multiple server failures occur, administrators need thetools that will allow a new collection of servers to be reconfigured quickly andaccurately.

V E R I T A S B A R E M E T A L R E S T O R E

VERITAS Bare Metal Restore extends the functionality of VERITAS NetBackup andprovides users with the ability to capture and restore system operating environmentssimply and automatically. VERITAS NetBackup provides backup and recoveryservices for both application and system data, while VERITAS Bare Metal Restoreaccelerates server recovery by automating the entire system recovery process. Byaccelerating the system recovery process, Bare Metal Restore improves systemavailability for NetBackup users.

©2003 IDC#03C35634

IDC survey results found that median expected downtimes for five scenariosrange from as little as an hour following a power outage to as much as two daysfollowing the loss of 20 servers. The graph shows the range of estimates in daysfor each of five scenarios.

VERITAS Server Restore Survey: Time to Recover

5 Days

4 Days

3 Days

2 Days

1 Day

Power Outage Network Failure Storage System Failure

Complete Server Failure

Failure of 20 Servers

90%

75%

Median

25%

10%

Legend

OVERVIEW

VERITAS Bare Metal Restore works together with VERITAS NetBackupDataCenterTM or VERITAS NetBackup BusinesServerTM. After BMR is installed,clients are backed up to the NetBackup server as before, but an additional procedurecalled "bmrsaveconfig" is automatically executed before every scheduled backup torecord the current state of the system's configuration, including the disk layouts andTCP/IP configuration. If changes are made to a system's configuration, these changesare automatically captured and recorded at the next scheduled backup without anyuser intervention.

In addition to the existing NetBackup server, BMR components include:

! BMR client code.

! BMR server components: the BMR main, file, and boot servers.

The BMR main server manages the clients supported by the Bare Metal Restoresystem, the process of preparing for the client restoration, and the post-processingafter the client has been restored. It makes the appropriate recovery resourcesavailable to the client and creates a customized recovery procedure. The BMR bootserver uses standard protocols to network-boot the BMR client. The BMR file serverprovides the client with the necessary operating system programs and libraries, theBare Metal Restore client package, the NetBackup client package and any othersoftware necessary to recover the system (e.g., VERITAS Volume Manager�).Collectively, these BMR file server resources are called the Shared Resource Tree(SRT).

5#03C3563©2003 IDC

IDC's survey asked respondents how many IT professionals were needed torestore the last server failure at their organization. IDC found that as many as sixdifferent individuals representing different technology skill sets were required.While 35% of respondents reported that a single individual restored the server,most respondents reported that two or more technicians were required. The trendfor more individuals to be involved was more pronounced for larger companies.

VERITAS Server Restore Survey: Skill Sets Required

©2003 IDC#03C35636

Source: VERITAS, 2003

BARE METAL RESTORE �PREPARE TO RESTORE�

FIGURE 1

(a)

(b)

DAILY OPERATIONS

NetBackup's policy engine allows system managers to schedule regular backups.Bare Metal Restore automatically captures operating environment metadata whenevera NetBackup full or incremental backup initiates. In addition to regularly scheduledbackups, system operators can request that Bare Metal Restore capture servermetadata at any time. System operators may perform this operation, a process takingjust a few minutes, before undertaking an operating system upgrade, for example, orbefore testing a new version of application software.

RESTORE PROCESS

Figure 1 depicts the browser-based Prepare to Restore interface. Invoking thisinterface and specifying parameters, such as which client is to be restored or IPnetworking addresses, typically is the first step towards system and data recovery withBare Metal Restore. The entire restore process for Bare Metal Restore is highlyautomated, and allows NetBackup customers to completely recover their machinesfrom normal backups, without requiring separate system backups or reinstalls.Following a severe system failure, Bare Metal Restore and NetBackup can restore themachine to the state at which it was last backed up with minimum intervention andopportunity for human error. In a few minutes after configuring the restore, Bare MetalRestore indicates that the client is ready to be restored, and an "OK" prompt isdisplayed (Figure 2). The operator then reboots the client, and the remainder of therecovery process is automated. When the recovery is completed the client automaticallyreboots, and the operating and application environments are ready to go.

7#03C3563©2003 IDC

Source: VERITAS, 2003

THE BARE METAL RESTORE CLIENT IS READY TO BE RESTORED

FIGURE 2

S C O P E O F O P E R A T I N G E N V I R O N M E N T S

VERITAS Software supports the Bare Metal Restore product in Windows and Unixenvironments. Both versions will be needed for most IT shops. Storage administratorswho straddle Unix and Windows environments will be pleased to note that Bare MetalRestore's graphical user interface is identical for the two environments, as Figure 3shows. By using a common interface across environments, Bare Metal Restoreprovides centralized system recovery management with a single product. Operatorswork from a single skill set, rather than needing to understand each operatingenvironment separately.

I D C A N A L Y S I S : O P P O R T U N I T I E S A N D C H A L L E N G E S

IDC believes that Bare Metal Restore represents a logical extension to NetBackup thataddresses the increasingly important need to reliably restore not just an applicationand its data, but also the complex operating environment on which the applicationdepends. Moreover, the ability to map and record operating environments for aheterogeneous collection of servers with a single tool makes Bare Metal Restore animportant part of an enterprise business continuity plan.

The high degree of automation designed into the Bare Metal Restore product isessential to its success. Backup cannot depend on the best intentions of a disciplinedsystem operator, and recovery should be accomplished as quickly and reliably as

©2003 IDC#03C35638

Source: VERITAS, 2003

THE BARE METAL RESTORE INTERFACE IS THE SAME FOR UNIX AND WINDOWS

FIGURE 3

possible. VERITAS Bare Metal Restore has assimilated datacenter requirements well.

Using NetBackup to store Bare Metal Restore's operating environment metadataalong with traditional databases and files extends the functionality of the VERITASproduct line. Bare Metal Restore's dependency on NetBackup does make BMR anaccessory rather than a standalone solution. Customers shopping more narrowly fora third-party server restore solution may not value this dependency. On the otherhand, BMR's ability to leverage NetBackup enables automation and eliminatesredundant usage of storage media and system backup procedures, saving time andmoney.

The VERITAS focus on Unix and Windows precludes the Bare Metal Restore optionfor companies that need a restore product for mainframe and other non-Unix/Windowsoperating systems. To be fair, however, competing products are offered by systemsuppliers and support rapid recovery for a particular operating environment. Bysupporting Unix and Windows with a single product, VERITAS Bare Metal Restore willappeal to IT departments with multivendor servers in the datacenter.

C O N C L U S I O N

Findings from the VERITAS Server Restore Survey demonstrate empirically the needfor IT departments to automate the ability to backup and restore server operatingenvironments much in the manner that file systems and databases are protected.VERITAS Bare Metal Restore is aimed squarely at reducing the time to recovery, laborrequired, and skills required. IDC encourages IT professionals responsible forbusiness continuity planning to evaluate the ability of NetBackup and Bare MetalRestore to accelerate the process of bringing an application back into productionquickly and reliably.

C O P Y R I G H T N O T I C E

External Publication of IDC Information and Data - Any IDC information that is to beused in advertising, press releases, or promotional materials requires prior writtenapproval from the appropriate IDC Vice President or Country Manager. A draft of theproposed document should accompany any such request. IDC reserves the right todeny approval of external usage for any reason.

Copyright 2003 IDC. Reproduction without written permission is completely forbidden.

9#03C3563©2003 IDC

IDC is a division of IDG, one of the world's top information technology media, research and exposition companies.

Visit us on the Web at www.idc.comTo view a list of IDC offices worldwide, visit www.idc.com/offices

IDC is a registered trademark of International Data Group.