Be Prepared IT Disaster Recovery Plan - cbs1.com.my - Chor - DR P… · Be Prepared IT Disaster...
Transcript of Be Prepared IT Disaster Recovery Plan - cbs1.com.my - Chor - DR P… · Be Prepared IT Disaster...
IBM Global Services
Be Prepared IT Disaster Recovery Plan
Chor Ming Chong
IBM Malaysia
Nov 3 2009
2
IBM Global Services
Strictly Private & Confidential
Agenda
�� What is DRPWhat is DRP
�� DR Plan DR Plan
�� DR DR OrganisationOrganisation
��Testing & MaintenanceTesting & Maintenance
��Procedures and PeopleProcedures and People
��EndEnd
3
IBM Global Services
Strictly Private & Confidential
4
IBM Global Services
Strictly Private & Confidential
Business Today
�Operations availability whenever customer needs them, guaranteed
•Relied on 24 hrs a day, 7 days a week, 52
weeks a year
•No single point of failure
•Technology key for success
5
IBM Global Services
Strictly Private & Confidential
What Is a DISASTER?
rA disaster is a unplanned interruption to the business/data center which renders it inoperable and inaccessible for a prolong period of time and there is a need to activate the contingency plan to support the business and meet the customers needs.
6
IBM Global Services
Strictly Private & Confidential
… It Won’t Happen to Us ...
� Some common misconceptions of Disasters..
- It won’t happen to me.
- we are safe where we are.
- Chances are remote.
- we have insurance.
- we have vendor support.
- not worth the efforts.
- Its too costly.
- cross the bridge when we
come to it.
7
IBM Global Services
Strictly Private & Confidential
Some Local Disasters
� Nestle House
- Fire, May 1994
� Bata (M) Bhd
- Fire, April 1994
� KL Plaza
- Structural damage, Jan 1994
� Wisma Stephens
- Explosion, June 1993
� Public Bank
- Fire, Feb 1992
� KL Golden Triangle
- Telecommunications failure, Aug 1992
� KL Stock Exchange
- Network failure, Sep 1993
� Klang Valley
- nationwide black out, Sept 30, 1992
� Nation wide tremors
- 1993
� Port Klang
- Tank explosion, june 1992
� Maybank
- System problems, Jan 1988
� MAS
- Tower fire, April 1992
8
IBM Global Services
Strictly Private & Confidential
9
IBM Global Services
Strictly Private & Confidential
Disaster Recovery Plan
q Is not Business Continuity Planning…but part of.
q A set of predefined plan, procedures and resources identified so that..
q The action to be taken.
q The resources to be used..
q The procedures to be followed.
q … BEFORE, DURING and AFTER a disaster event).
q It defines the scope and recovery time objectives to ensure that business can proceed uninterrupted in the event of a disaster.
q It enables a corporate company to stay in
business.
10
IBM Global Services
Strictly Private & Confidential
Why Is DRP Required?
REDUCE
LIKELIHOOD
OF DISASTER
LIMIT THE
DAMAGE AND
LOSSES
FULL
RESTORATION
A.S.A.P.
ENSURE CRITICAL
BUSINESS
AVAILABILITY
11
IBM Global Services
Strictly Private & Confidential
Reasons for DRP
í Complexity of business and its integration of
services makes planning essential
í Increased dependencies on IT to deliver
products quickly and efficiently
í Time is of the essence…the longer you stay
away, the more difficult the recovery will
become
í Prolong outage will result in loss of business competitive edge
í Disaster cannot be managed informally
12
IBM Global Services
Strictly Private & Confidential
PREEQUISITE To DRP
� Scope of DRP
� Level of priority
� Emergency service objective
� Resource requirements
� Acquisition strategy
� Interim processing
� Return to business
13
IBM Global Services
Strictly Private & Confidential
Basic DR Plan
r Identification of individuals responsible for disaster recovery activities.
r Organisation of DR teams assigned to specific recovery functionality.
r Procedures to recover normal processing functions at the disaster backup center for business continuity.
r Business recovery priority and the expected time frame for recovery.
r Identify resources needed for the alternate processing to meet the service level expectations.
r Recovery of the premises and/or assets (such as computers, factory equipments etc. ).
r Or the rebuilding of a new office/factory or computer data center.
r IMPORTANT NOTE : A plan is only good on paper unless it is proven to be working. TEST your plan periodically…it is the only way to know if it works !!
14
IBM Global Services
Strictly Private & Confidential
Recovery Objectives
r To give a clearly defined course of action and provide for an orderly and timely recovery from a major interruption of computing services arising from a Data Center disaster
r To identify personnel, resources and functions necessary for continued operations in the event of a disaster
r To identify those systems critical for the business survival and define the alternative procedures for ongoing support in the event of a prolonged outage
r To identify critical resources that would be necessary for temporary replacement of processing service and resumption of normal computing service
r To specify steps necessary to relocate the business to an alternate site if required
r To identify customers/clients that must be notified during the outage
r To use established plan to resume normal processing, within predefined limits, after a disaster
r To document the procedures for safeguarding of critical resources (e.g.. Inventory of hardware, stock, computer equipments, invoices etc..) at a off-site storage
15
IBM Global Services
Strictly Private & Confidential
In Short…DRP� Describes.
- THE ACTION TO BE TAKEN.
- THE RESOURCES TO BE USED..
- THE PROCEDURES TO BE FOLLOWED..
� BEFORE, DURING and AFTER an unlikely event that renders inoperative all or part of an organisation’s information processing and communication resources..
16
IBM Global Services
Strictly Private & Confidential
17
IBM Global Services
Strictly Private & Confidential
Assumptions..r Business Continuity Planning completed
r Business Impact Analysis
r Define Impact and identify critical core business that will severely impact company’s survival in the event of a disaster
r Provide quantitative/qualitative facts to support the need for a BCP
r Define your recovery time objective and recovery point Objectives
r Risk Analysis
r understand what risks pose the gravest threat to your business asset
r recommends cost-effective safeguards
r Recovery Strategies
r None/reciprocal/cold/warm or hot
r Build own or subscribe or hybrid
18
IBM Global Services
Strictly Private & Confidential
Policy and Key Activities
r Top Management commitment to the establishment and maintenance of a comprehensive, viable and practical Disaster Recovery Plan.
r States what is required for Corporate survival should the data centers be subject to a disaster.
r Define Scope of DR Plan.
r Define the priorities of applications to be recovered ( BIA ).
r Address the vital business functions of company by providing for alternative processing methods to handle those applications that are necessary for Corporate survival.
r Define Backup and archival strategies.
r Establish roles and responsibilities and the priorities to be given to the team for the recovery process.
r Perform damage assessment of host and return to Normalcy criteria.
19
IBM Global Services
Strictly Private & Confidential
What Is a DISASTER?
rA disaster is a unplanned interruption to the business/data center which renders it inoperable and inaccessible for a prolong period of time and there is a need to activate the contingency plan to support the business and meet the customers needs.
20
IBM Global Services
Strictly Private & Confidential
Recovery Scope
r Defines the scope and breadth of the DR Plan
r What location/premises is the DR Plan covering
r What business applications that it is supporting
r What is the alternative site located
r What is the strategy
r None/reciprocal/cold/warm/hot
r What is the Recovery Time Objective
r What is the Recovery Point Objective
21
IBM Global Services
Strictly Private & Confidential
RTO
�Recovery Time Objective
- Time taken to recovery the services from the
time of disaster (declaration) to the time the
services is made available to the users at the
recovery site
Disaster
Strikes
Declare
Disaster
Resume
Services at
DR site
Return to
Normalcy
RTO
22
IBM Global Services
Strictly Private & Confidential
RPO
�Recovery Point Objective
- Recovery of Data to a predefined state in time
in the event of a disaster
Normal
Data EntrySystem
Go offline
Resume
Services at
DR site
Recover data
Up to predefined
point in time
Declare
Disaster
Return to
Normalcy
RPO
23
IBM Global Services
Strictly Private & Confidential
Recovery Facilities
rOwn or Subscribed
rOwn
r Maintenance
r Upgrades
r Testing
r Availability
r Network
rSubscribed
r Company background
r Facilities –
r Accessibility
r Reliability
r Resiliency
r Availability
r Security
r Environmental
r Network Infrastructure
r Other support facilities
r Work area floor space
r Cold site
r Storage requirements
r Value add support
r Subject matter experts
24
IBM Global Services
Strictly Private & Confidential
Command Center
� Provide centralised and coordinated management and control of all recovery and communications during a disaster recovery situation.
� There can be 2 command centers designated:
� Command Center 1 for Recovery Management Team . The EMT, Admin, Audit and the DRC will be stationed here.
� Command Center 2 for operations and technical Team to restore the computer and network operations at the recovery backup site.
� Important.
� Identified Location and made known to all.
� Available at all times.
� Equipped with necessary items to operate 24 hours round the clock.
� Communications : Faxes, dedicated phone lines ( predefined incoming/outgoing/hotline ), video conferencing, TV and radio access for news updates, internet and intranet access. Etc).
25
IBM Global Services
Strictly Private & Confidential
Offsite Storage
r It is essential for the success of any recovery plan to have a backup copy
of your critical business data and documentation and stored in an offsite
facility.
r Documentation of offsite data
r Schedule of backup
r Archival frequency
r In the event of a disaster, all the data can be retrieved from the archives at
the offsite locations for the recovery of the affected data center.
r Important : Offsite storage must be :
r Protected
r Accessible at all times
26
IBM Global Services
Strictly Private & Confidential
27
IBM Global Services
Strictly Private & Confidential
q Immediate Response Steps
q Notification
q Alert Team Leaders, members & DR site
q Conduct initial debriefing
q Perform damage assessment
q To declare or not to declare
q Setup command centers
q Assemble all team members
q Execute DR Plan
28
IBM Global Services
Strictly Private & Confidential
q Setup command center
q Mgmt Command Center :
q Emergency Management Team/Site Restoration Team /Audit Team/DR Coordinator/DR Manager
q Recovery Team Command Center :
q DR Technical /Operations Teams
q Recovery on at DR Site :
q Establish recovery center online systems & network communications
q Recovery at damage data center :
q Perform detailed damage assessment process including salvaging process..etc
29
IBM Global Services
Strictly Private & Confidential
� Resume normal online processing at recovery center
� Support online users
� Batch and report distribution
� Staffing
� Logistics & supplies
� Damage Assessment.
- Repair or rebuild new.
� Salvaging data and equipment.
� Acquisition of hardware, software etc.
� Reestablishing network communications.
30
IBM Global Services
Strictly Private & Confidential
q Confirm new host data center ready
q Environmental resources
q Hardware/software equipment
q Network communications
q Logistics and supplies
q Migration back to new host
q Shutdown online system
q Recovery online system at new host
q Resume normal processing
q Recovery cycle complete
31
IBM Global Services
Strictly Private & Confidential
Alert
Assessment
Notification
Travel To Recovery
Site Restore/ResumeSystems
Return Home
Services Resumed
DisasterPlan
OnlineFunctional Verification
Event
Declaration/Mobilization
0 4-6 7(1) 9(2) 10(1) 11 ....est.
ServerLocal
Bridge
WSWS
32
IBM Global Services
Strictly Private & Confidential
Typical Escalation Process
PROBLEM
DETECTED
ALERT CONDITION 1: Up to 15 minutes Shift
Supervisor attends to problem
PROBLEM
RESOLVED ?YES
NO
ALERT CONDITION 2 : 15 minutes to 2 hours
Computer Operations Supervisor informed.
ALERT CONDITION 3 : 2 TO 4 HOURS
Notify DR Coordinator/Manager
PROBLEM
RESOLVED ?YES
NO
ALERT CONDITION 4 : 4 hours or more
Alert Emergency Management Team
PROBLEM
RESOLVED ?YES
NODECLARE DISASTER
MORE THAN 13 HOURS
ACTIVATE DRM TEAMS
IMMEDIATE DISASTER
eg. Total power failure,Hardware
destroyed, fire etc.
RECORD DOWN
EVENT
RESUME NORMAL
PROCESSING
OTHER ALERTS:
1. Operations Support
2. Applications Support
3. Network Support
4. Vendors
ONLINE IMPACTED:
Inform users, S&M, B&O
ON STANDBY:
1. Alert DRC
2. Alert all recovery
Teams
33
IBM Global Services
Strictly Private & Confidential
34
IBM Global Services
Strictly Private & Confidential
DR Teams - Overview
r Responsible to manage and execute all
activities defined in the DR Plan.
r An identification of those individuals who
would be responsible for disaster recovery
activities.
r An outlet of duties for the management and
recovery teams in the Disaster.
r Structured processes involving various areas
of recovery.
r Achieve timely and orderly manner of
recovery.
35
IBM Global Services
Strictly Private & Confidential
Disaster Management Team Structure
Chairman
Audit
Disaster Management TeamChairman ; CIO
Members : IT Senior Management Team, Business Application Senior Management Team, Other Senior Management Team ( Hr/Finance/Acct etc) and the Disaster Recovery Manager and Coordinators
Damage Assessment Team
Members : Facilities, network, systems,
Servers and storage, Operations, security,
Applications and vendors
Site Restoration Team
Members : Facilities, network, systems,
Servers and storage, Operations, security
Applications and vendors
Technical Recovery Team
Members : Facilities, network, systems,
Servers and storage, Operations, security
Applications and vendors
BCP
36
IBM Global Services
Strictly Private & Confidential
DR Management Team
r Chairman : CIO or Head of the IT Division/Department.
r Reports to the company Chairman and Steering Committee.
r Responsible for high-level business recovery operations.
r Main tasks:
r Review impact of disaster on the business.
r Declaration of Disaster.
r Issue directive to activate DR organisation.
r Provide a channel for key decision(s) during the recovery operations.
r Consists of members from the senior Management, selected corporate Co-
ordinators and end-users management.
37
IBM Global Services
Strictly Private & Confidential
DR Management Teamr DR Manager : Responsible for providing overall direction for EDP
recovery operations.
r Main task :
r Damage assessment of the computer facilities and environment.
r Activation of the various DR Teams.
r Set up command centers.
r liaise with and provide advice to senior management on recovery
progress.
r Co-ordinate actions of the various recovery teams.
r Consists of the following members:
r DR Coordinators.
r Logistics Support Team Manager.
r Systems Team Manager.
r Applications Team Manager.
r Network Team Manager.
38
IBM Global Services
Strictly Private & Confidential
DR Management Team Structure
q Logistics Support Team.
q Responsible to Serve as expeditors and suppliers of resources to other disaster recovery team members.
q Main tasks :
q All DR logistics issues including transportation and lodging.
q Financial matters (i.e. cash advances),
q Security.
q Public relations.
q Insurance and legal assistance.
q Members :
q Representative from Personnel.
q Representative from Security.
q Representative from Public Relations.
q Representative from Legal.
q Representative from Head Office Administration.
39
IBM Global Services
Strictly Private & Confidential
Applications Solutions
q Application Support :
q Responsible for coordinating and providing support between all user
locations and the computer backup site.
q Main tasks :
q Advise branch and HO Depts. On use of interim manual
procedures.
q Update users on recovery progress.
q Assists users to resume normal operations when system is
available.
q Members.
q Representative SAP applications.
q Representative Non SAP applications.
q User Liaison
40
IBM Global Services
Strictly Private & Confidential
Systems Servicesq Systems Services Team
q Responsible for the technical recovery of the computer and data
communications facilities
q Main task:
q Restoration of critical online systems in backup recovery center
q Resuming normal processing functions at backup recovery center
q Members
q Systems
q Operations
q Security
q Database Administrator
q Vendors
41
IBM Global Services
Strictly Private & Confidential
Network
q Network Team
q Responsible for the data communication recovery between users, production
data center and the backup site :
q Main task:
q Restoration of critical online systems network connectivity to the
backup recovery center
q Resuming network connectivity access to the backup recovery
center
q Members
q Wan
q Lan
q Firewall
q Telcos
q Vendors
42
IBM Global Services
Strictly Private & Confidential
Site Restoration Team
q Site Restoration Team : Responsible for all damage assessment and ascertaining the extent of damage to operations affecting all computing and data communication facilities. Also in charge of rebuilding new computer center for business resumption plan inclusive of acquisition strategy.
q Main tasks.
q Damage assessment and impact analysis on extent of damage.
q Inventory and acquisitions of new computer hardware and network data communication for replacement or repairs.
q Restoration of the damage computer center or rebuilding of a newcomputer center.
q Members:
q Building Management/Property.
q Global Services
q Global Network
q Vendors.
43
IBM Global Services
Strictly Private & Confidential
44
IBM Global Services
Strictly Private & Confidential
DISASTER
STRIKES
CALL DISASTER
RECOVERY
COORDINATOR
CALL
EMERGENGY
MANAGEMENT
TEAM
DECLARE
DISASTER
Global Services
Global Services
&
Global Network
LOGISTICS
SUPPORT
TEAM
INFORM
DRC
SITE
RESTORATION
TEAM
1. SYSTEMS &
METHODS
2. HO FINANCE
3. HO CREDIR
4. BRANCH &
OPERATIONS
5. IS SERVICES
SETUP PRIMARY & SECONDARY
COMMAND CENTER
SECURITY
OPERATIONS
SYSTEMS
APPLICATION
NETWORK
DATABASE
1. SECURITY 2. ADMIN.
3. HO ADMIN 4. LEGAL
5. PUBLIC RELATIONS
6. BRANCH & OPERATIONS
7. PERSONNEL/HRD
Global Services
Vendors
Building Mgmt.
Property.
Global Network
1. COORDINATE AND PROVIDE SUPPORT1. COORDINATE AND PROVIDE SUPPORT
BETWEEN USER LOCATIONS AND BACKUPBETWEEN USER LOCATIONS AND BACKUP
SITE.SITE.
2. ADVISE USERS/BRANCHES ON INTERIM2. ADVISE USERS/BRANCHES ON INTERIM
MANUAL PROCEDURESMANUAL PROCEDURES
3. MONITOR AND UPDATE RECOVERY PROCESS3. MONITOR AND UPDATE RECOVERY PROCESS
4. HELP USERS RESUME OPERATIONS AFTER4. HELP USERS RESUME OPERATIONS AFTER
INITIAL INTERRUPTIONINITIAL INTERRUPTION
1. RESTORE COMPUTER
OPERATIONSAND RESUME
NORMAL PROCESSING
2. RESTORE NETWORK
COMMUNICATIONS SERVICES.
3. SUPPORT ONLINE SERVICES AT
BACKUP CENTRE.
EXPEDITERS AND SUPPLIERS OF RESOURCES
TO RECOVERY TEAMS
FOR FINANCIAL MATTERS,TRANSPORTATION,
SECURITY, INSURANCE, LEGAL AND
PUBLIC RELATIONSAND OTHER
LOGISTICS ISSUES
1. PERFORM DAMAGE ASSESSMENT AND
SALVAGING PROCESS
2. INITIATE REBUILDING PROCESS OF
HOST OR DECIDE ON NEW SITE
3. ACQUISITION OF PROPERTY AND
COMPUTER HARDWARE AND
TELECOMMUNICATIONS
4. HANDLES INSURANCE CLAIMS
D.R. MANAGER
DISASTER RECOVERY
MANAGEMENT TEAMDR COORDINATOR
OFFSITE-STORAGE
45
IBM Global Services
Strictly Private & Confidential
46
IBM Global Services
Strictly Private & Confidential
Technical Recovery Plan Lay-out
q Every DR Team will have a plan unit
q Appointed Team leader and alternate team leader
q Team leader’s responsibility to ensure
q The plan is current and workable ( maintenance and testing )
q All team members are trained and familiar with the plan
q One copy onsite for updates and maintenance
q One copy kept offsite at recovery center for retrieval
q Plan must include the following:
q Preventive procedures
q Recovery procedures
q Interim procedures
q Resumption procedures
47
IBM Global Services
Strictly Private & Confidential
AS Good As the Next…
q Exercise your DR Plans periodically
q Fixed and publish the date and time of the test
q Get Management Endorsement and commitment for the test dates
q Levels of Testing
q Level 1 : System Testing
q Level 2 : Application Testing
q Level 3 : Network Testing
q Level 4 : Full/Integrated Testing
q Post Review Management Report
q Provide report after every test
q Report should show:
q Expected test results compared with actual results
q Lessons Learnt
q Improvement plans
48
IBM Global Services
Strictly Private & Confidential
Keeping the Plan Current…
q Plan maintenance
q Hardware changes
q Software changes
q Network changes
q Policy changes
q Periodic Plan Maintenance
q Responsibility of Respective Team leaders
q Review procedures quarterly/half yearly
q Review Overall DR Scope yearly
q Major changes
q Change Management
49
IBM Global Services
Strictly Private & Confidential
50
IBM Global Services
Strictly Private & Confidential
Procedures Creed
r Procedures for every phase of recovery from prevention, to recovery and interim processing and finally resumption of normal business
r Procedures are only good and valid if properly maintained and ownership properly identified
r Procedures must be made available at recovery center
r Procedures must be tested periodically
51
IBM Global Services
Strictly Private & Confidential
q The plan cannot be executed without people !!
q Identification and training must be provided
q Key staff contact must be maintained
q Know your people and their roles
q People are the key to the success of the Plan
No People No Work!!
52
IBM Global Services
Strictly Private & Confidential
Measure of Success
�Successful testing of a recovery capability is accomplished by developing and using a test plan that exercises the recovery procedures and
documentation
� Full support of Management and full commitment with participation of the staff
�For total success. the two cannot function without the other
53
IBM Global Services
Strictly Private & Confidential
DRP Golden Rules
q The Plan must be tested regularly to ensure that it is:
q workable
q up to date
q covers all critical areas as defined in the SLA
q Make it a Corporate Policy to perform a DR test at least twice a year
q The DR teams must be trained according to the procedures set
q Procedures must be maintained up to date and kept offsite
q Critical Data must be kept offsite and readily retrievable in times of needs.
q The plans must be reviewed periodically to ensure that it is up-to-date and the scope updated as when the business grows.
q The backup recovery resource capacity should be reviewed periodically.
q Remember.. Your DR Plan is only as good as the next test!!
54
IBM Global Services
Strictly Private & Confidential
Towards Resilient Business Infrastructure
Risk Analysis
Business Impact
Analysis
Business and
Technical Strategy Definition
Enterprise
Solution
Design
Business Continuity
Plan
IT Recovery
Plan
vital processesidentification
outagecosts
thresholddefinition
tolerableoutage
planning ofgap elimination
step by step(short, medium,
long terms)
solution design(technology,organization,
operation)
I/T plandevelopment
test
maintenance
undesiredevents
probability
safeguardeffectiveness
assetsvalue potential
loss
vulnerability
securitypreventivemeasures
Recovery Strategies
resources/assetschain for vital processes
currentrecoverycapabilitygap identification
(needed vs current)
corporate wide plan forcrisis management
design
implementat
ion
analysis
55
IBM Global Services
Strictly Private & Confidential
56
IBM Global Services
Strictly Private & Confidential
IBM Global Services
Thank You