Disaster Recovery Planning. Questions to the Audience.
-
Upload
justin-fitzhugh -
Category
Documents
-
view
222 -
download
0
Transcript of Disaster Recovery Planning. Questions to the Audience.
Disaster Recovery Planning
Questions to the Audience
What is an IT Disaster• What is an IT Disaster?
• ‘Disaster’ – the unplanned interruption of normal business processes resulting from the interruption of the IT infrastructure components used to support them.
Common Types 1 :
1. Healthcare Information and Management Systems Society (himss.org)
Power outages 28% Hurricanes 6%
Storm Damage 12% Fires 6%
Floods 10% Software Error 5%
Hardware Error 8% Power surge/spike 5%
Physical Attack 7% Earthquake 5%
What is an IT Disaster• What is an IT Disaster?
• ‘Disaster’ – the unplanned interruption of normal business processes resulting from the interruption of the IT infrastructure components used to support them.
Common Types:
✔ Power outages 28% Hurricanes 6%
✔ Storm Damage 12% ✔ Fires 6%
✔ Floods 10% ✔ Software Error 5%
✔ Hardware Error 8% ✔ Power surge/spike 5%
Physical Attack 7% Earthquake 5%
Business Continuity versus Disaster Recovery
• These are not the same thing!
• Business Continuity (BC): Considers the academic, research and business functioning of the institution as a whole. Includes risk assessment, and plans for functional units and business processes. Potentially wider variety of scenarios to consider.
• Disaster Recovery (DR): IT activities to enable recovery to an acceptable condition after a disaster. BC includes DR. DR requires guidance from BC to direct priorities and set scope.
What is the York DR Plan?Review 2008 Plan• Project start: January 2003• Sponsored by CIO and VP Finance and Administration• Scope
• Systems: “key information systems”• Scenarios: “localized disaster or failure”
• Intended to be a multi-phase, multi-year project
What is the York DR Plan?• Engaged functional unit leaders and IT support areas
• Asked to identify maximum tolerable outage and data loss• Surprise: >50% of business processes ranked “critical”• Reality check based on observed impacts from lesser-scale
outages• VP and AVP consultations were the final step to confirm
criticality
Risk Management
Cost ofIncidents
Cost of Countermeasures
Degree of Assurance
OptimalCost/Benefit
Low High
What is the York DR Plan?• DR Threat Assessment
• Proximity to heavy industry – Oil depot across street• Freight train corridor (chemical spill 1980)• Near intersection of major highways (400 & 407)• York main campus on flight path of two airports• Main data centre in basement of old building with UPS but no
generator• High pedestrian traffic (Science Library and washrooms
upstairs) directly overhead• Worst case scenario chosen:
• Loss of building containing main data centre
What is the York DR Plan?• By 2008
• Secured Telus site for secondary site • Identified 4 categories of information systems
• Recovery Point Objectives (RPO)• Recovery Time Objectives (RTO)• Strategy defined on style of recovery for each• Business owners classified which systems belong in which
categories• Large infrastructure upgrades identified to meet the RTO/RPOs• Planned to annually refresh DR plan
2012 DRP Refresh• It’s been 4 years
• Big upgrade on storage and core network• Acquisition of second on-campus data centre• IT department merger• And …
2012 DRP Refresh
2012 DRP Refresh
2012 DRP Refresh
2012 DRP Refresh
2012 DRP Refresh
2012 DRP Refresh
2012 DRP Refresh
2012 DRP Refresh
Goals for 2012 Refresh2012 Goals
• Focus on C1 business applications as of 2012• IT staff / office space not in scope• Scenario is the loss of a single data centre (not both)
• Validate the categorization of “information systems”• Gap Analysis for C1 information systems• Table-top recovery scenario for supporting infrastructure
Methodology• Produce the complete UIT-supported application inventory
• How hard can this be?• The one list did not exist
• Categorize Applications and focus on 2012 C1 Applications• Gap Analysis and Planning• Tabletop Recovery of supporting infrastructure
DR CategoriesCategories and associated RTOs/RPOs
Category Summary Recovery Time Objective (RTO)
Recovery Point Objective (RPO)
Category 1 Vital Communications and Emergency Services
<= 4 hours <= 15 minutes
Category 2 Critical Customer / Partner Interfaces and Emergency Systems
<= 48 hours <= 15 minutes
Category 3 Critical Customer / Partner Interfaces and Emergency Systems
<=7 days <= 24 hours
Category 4 Critical Internal Departmental Services and Non-Critical Customer Interface
<= 14 days <= 48 hours
Application Categorization• CIO/Business owners re-categorized the application list• Result:
• “information systems” changed criticality2008
• C1 – 5 services; C2 – None
2012• C1 – 5 different services; C2 – 7 services
C1/C2 Applications• Gap Analysis• Table-top recovery scenario
• “That is still in service, why?”, “That does what? When did that start?”
• Documentation, documentation, documentation• Update deployment and SOP for services
Example Normal Service
Example Recovered Service
DR of Supporting Infrastructure• The Business focuses on applications• Document infrastructure service dependencies
• Determine the services required by Infrastructure groups to complete a recovery• ie: Monitoring, secure access, system inventory, recovery
documentation, etc
• Some services are considered Category 0 services• ie: storage, network, and power
• Tabletop recovery exercise
Lessons Learned• RTOs and RPOs are set by the business not IT
• IT helps in getting to the real requirement• Services evolve and RTOs change• Infrastructure capabilities change• Identify key technologies• Continual Improvement
• DR is big .. Do it in small chunks• DR is not Backup
• DR Planning can be used in more than just DR
Next Steps• Review the DR plan for remaining services• Asking the DR question up front• Disaster RTO/RPO versus Operational RTO/RPO• Bring staff space and equipment into scope
QuestionsChris Russell Director of Information and Communication Technology Infrastructure, York [email protected]
Rick Smith Lead Architect, York [email protected]