Samit Roy, President, SciCom Infrastructure Services, Inc
-
Upload
datacenters -
Category
Business
-
view
788 -
download
0
Transcript of Samit Roy, President, SciCom Infrastructure Services, Inc
From Disaster Recovery to Business Continuity
September 21, 2006
New Challenges for CIOs
Page 2INFORMATION SYSTEMS
Agenda
• Examples at DeKalb County
• Disaster Recovery to Business Continuity
• Developing a Business Continuity Plan
Page 3INFORMATION SYSTEMS
From simple Disaster Recovery to
Business Continuity
Page 4INFORMATION SYSTEMS
Threats, Risks to businesses
Tsunami strike
• Geo - Political - Biological -Technological uncertainties in today’s world
• But similar threats were always present in the past, then why is it an issue today?
Page 5INFORMATION SYSTEMS
Quantifying the effect of Disaster
RPO is the point in time that marks the end of the period during which data can still be recovered using backups, journalsRTO defines how quickly information systems, services and processes must be operational following some kind of incident including recovery of applications and data
RPO
RTO
Disaster
Busin
ess
Proc
ess/
Know
ledg
e
Time
Resume Business
The following parameters are used to quantify the magnitude of the loss to businesses because of disaster
Page 6INFORMATION SYSTEMS
Early Justice Processes• I
nfor
mat
ion
Shar
ing
• Com
plex
ity• T
echn
olog
ical A
dvan
ces
Time
Independent ProcessSingle System
• Justice processes were independent• Manageable number of cases, convictions• There were less number of agencies to interact• Mostly paper based or Mainframe stored data
< 1990’s
Page 7INFORMATION SYSTEMS
How did we cope with disasters?• I
nfor
mat
ion
Shar
ing
• Com
plex
ity• T
echn
olog
ical A
dvan
ces
TimeIndependent Process
Single System
Disaster Recovery is a technology recovery often conducted in “reactive mode”
• Focus: Data Center/Paper Copies• Technology: Mainframe/Mid Range Computing• Behavior: Casual Reaction, resumption > 3 days• Limited Business Scenarios
Disaster Recovery
Business Recovery
• Focus: Data Center and Back Office• Technology: Client Server Computing• Behavior: Rapid Reactions
< 1990’s
Page 8INFORMATION SYSTEMS
Today’s Justice Processes• I
nfor
mat
ion
Shar
ing
• Com
plex
ity• T
echn
olog
ical A
dvan
ces
Time
Interdependent ProcessesMultiple Systems
• Today Government’s responsibility and
responsiveness to the Society has increased • Justice needs has changed based on increase in
population, immigration, socio-economic changes,
changing demographics• Accuracy, Timeliness, reliability of Justice System
rests on Inter-Agency dependence and Integrated
Systems• Any disruption in any part of the Justice System
will interfere with the overall justice process and
thus affect the services to the Citizens
> 2000’s
Page 9INFORMATION SYSTEMS
Let’s take a sample scenario…..
StatewideWarrant System
StatewideCriminal History
Records Repository
National Criminal History Records
Repository
Victim Compensation Fund
Court Information System
Justice InformationServer
Sheriff Information System
Booking Information System
Prison Information System
State/CountyTreasurer System
Police InformationSystem
Prosecutor InformationSystem
Public Defender System
Dept of Health and HumanServices Inf. System
Dept of Welfare Info System
Dept of Education Info System
Dept of Motor Vehicle Info System
Medical Licensing Board Info System
State and CountyDay Care Licensing
Sex Offenderregistry
1. Query Subject
2. Arrest Warrant 3. Arrest Report
6. Prosecution Case Intake Document
7. Charging Document
Pre-trail Services Info Sys
8. Calendar8. Calendar 9. Request Pre-Trial Report
8. Motions 8. Notification of hearings
10. Subpoenas to witness 10. Subpoenas to victims
11. Sentence
4. Booking Fingerprint Images
5. Arrest Information Broadcast
Sample Integrated Criminal Justice System
Page 10INFORMATION SYSTEMS
Possible Disruptions
StatewideWarrant System
StatewideCriminal History
Records Repository
National Criminal History Records
Repository
Victim Compensation Fund
Court Information System
Justice InformationServer
Sheriff Information System
Booking Information System
Prison Information System
State/CountyTreasurer System
Police InformationSystem
Prosecutor InformationSystem
Public Defender System
Dept of Health and HumanServices Inf. System
Dept of Welfare Info System
Dept of Education Info System
Dept of Motor Vehicle Info System
Medical Licensing Board Info System
State and CountyDay Care Licensing
Sex Offenderregistry
1. Query Subject
2. Arrest Warrant 3. Arrest Report
6. Prosecution Case Intake Document
7. Charging Document
Pre-trail Services Info Sys
8. Calendar8. Calendar 9. Request Pre-Trial Report
8. Motions 8. Notification of hearings
10. Subpoenas to witness 10. Subpoenas to victims
11. Sentence
4. Booking Fingerprint Images
5. Arrest Information Broadcast
• County/State/GCIC Network is down, Wireless connection is down
• Bomb explosion in Court House
• Precinct is flooded because of Water Main burst, Hurricane etc
• 911 Center is infected by biological agents
• Electrical grid failure
• The following are typical disasters that may disrupt the prosecution of the convict
• Employee has lost the file folder containing the “Sentence Report”
• The “Court Information System” is corrupted• Backup tape was stolen during the transport
• Jail is infected with pandemic influenza
• Fire has burned down the Records Retention Center• “Cyber crime” has broken security and altered criminal records
Seve
rity
Page 11INFORMATION SYSTEMS
How do we recover from the disruption?
• Restoring a Server from backup will not suffice
• We need to think about the whole Integrated Justice Process• We need to plan for Disaster with every possible scenarios
Readiness
Preven
tion
Recover
y
Resumptio
n
Response
Test & TrainEvaluate & Maintain
Business Continuity
Page 12INFORMATION SYSTEMS
Business continuity goes beyond disaster recovery to ensure the continued availability of essential services, programs, and operations in the event of unexpected interruptions. Business continuity addresses enterprise-level and end-to-end solutions, from design and planning to implementation and management, with a focus on urgency
Business continuity must be approached holistically, including supporting process management interdependent with availability and security, to manage operational risk effectively. Business Continuity Management is not just a job for your IT team - it is an operational issue.
A team effort is required to develop comprehensive plans for critical operations including not just computing processes but also operational, building systems, suppliers, and other processes. Organizations should also consider long-term operations:
What can they do with their displaced workers? How do they communicate with other stakeholders and partners? How can we get the job done without our existing support network?
What is Business Continuity?
By taking a holistic approach to business continuity service and evaluating the solution from an IT and a business level both internally and externally, organizations ensure business is always available, performing and secure.
Page 13INFORMATION SYSTEMS
Developing a Business Continuity Planfor the Justice Organization
Page 14INFORMATION SYSTEMS
Business Continuity Plan - Basics
Readiness
Preven
tion
Recover
y
Resumptio
n
Response
Test & TrainEvaluate & Maintain
• Build the business continuity awareness, plans and strategies as a part of the enterprise culture
• Business continuity planning is evolutionary. Maintenance of the plan and events experienced will necessitate revisions and/or additions of plans
• The Business Continuity Planning methodology is basically one. However consultants and vendors twists it in different ways to sell their uniqueness
• The Business Continuity Planning should basically be driven by the inside organization and not the vendorSource: ASIS International
Page 15INFORMATION SYSTEMS
Business Continuity Plan - Steps
Readiness
Preven
tion
Recover
y
Resumptio
n
Response
Test & TrainEvaluate & Maintain
Readiness Objective: Address the preparatory steps required to provide a strong foundation on which to build BCP.
Tasks:• Assign Accountability
• Corporate Policy • Ownership of Systems, Processes and Resources• Planning Team• Communicate BCP
• Perform Risk Assessment • Review Types of Risks that can impact business
• Conduct Business Impact Analysis• Identify Critical Processes• Assess Impact if there is a Crisis• Determine RPO and RTO• Identify Resources required for recovery
• Agree on Strategic Plans – Agreeable, Attainable, Probable, Verifiable and Cost Effective
• Crisis Management and response team DevelopmentSource: ASIS International
Page 16INFORMATION SYSTEMS
Business Continuity Plan - Steps
Readiness
Preven
tion
Recover
y
Resumptio
n
Response
Test & TrainEvaluate & Maintain
Prevention Objective: Address those areas where good planning will allow an organization to avoid, prevent or limit the impact of a crisis occurring
Tasks:• Compliance with Corporate Policy
• Mitigation Strategies
• Devise Mitigation Strategies
• Resources needed for Mitigation
• Monitoring Systems and Resources
• Avoidance, Deterrence and Detection
• Based on warnings
• Employee behavior based on motivation
• Security based programs for warnings
Source: ASIS International
Page 17INFORMATION SYSTEMS
Business Continuity Plan - Steps
Readiness
Preven
tion
Recover
y
Resumptio
n
Response
Test & TrainEvaluate & Maintain
Response Objective: Develop the steps that will be required to respond effectively, appropriately, and timely should a crisis occur
Tasks:• Potential Crisis Recognition
• Identification of Danger Signals• Civil or political instability • Hostile labour unions• Impending strikes and likely protests
• Responsibility to Report Potential Crisis• Notify the Teams
• Parameters for Notification• Custody and Updates to contact information• Types of Notification
• Assess the Situation• Declare a Crisis
• Execute the Plan – Routine emergency incidents, minor/moderate/major business interruptions
• Communications – Audiences, Call Center, MediaSource: ASIS International
Page 18INFORMATION SYSTEMS
Business Continuity Plan - Steps
Readiness
Preven
tion
Recover
y
Resumptio
n
Response
Test & TrainEvaluate & Maintain
Response • Resource Management
• Accounting for all individuals
• Notification of Next-of-Kin
• Crisis Counseling
• Crisis Management Center
• Financial Support
• Immediate functional Payroll systems, Telephone
Systems, Email System, Internet Sites
• Alternate Worksites, Secondary Data Centers
• Offsite Storage
• Transportation – Air, Road, Rail
• Supplies/Service Providers
Source: ASIS International
Page 19INFORMATION SYSTEMS
Business Continuity Plan - Steps
Readiness
Preven
tion
Recover
y
Resumptio
n
Response
Test & TrainEvaluate & Maintain
Recovery and Resumption Objective: Develop policies, procedures and plans to bring the organization out of crisis, recover/resume critical processes and finally return to normal operations
Tasks:• Damage and Impact Assessment
• Crisis involving Physical Damage
• Crisis not involving Physical Damage
• Resumption of Critical and Remaining Processes• Process Resumption Prioritization• Resumption of Critical Processes• Resumption of remaining processes
• Return to Normal Operation
Source: ASIS International
Page 20INFORMATION SYSTEMS
Business Continuity Plan - Steps
Readiness
Preven
tion
Recover
y
Resumptio
n
Response
Test & TrainEvaluate & Maintain
Test and Train Objective: Train and educate team members as well as general employee population, and validate and embrace the BCP
Tasks:• Education and Train
• Educate and Train teams• Educate and Train all personnel
• Test the BCP• Benefits of Testing• Goals and Expectations• Planning and Development • Timeline• Scope of Testing• Test Monitoring• Test and Exercise Scenarios• Test and Exercise Roles• Test and Exercise participation• Test and Exercise Evaluation• Ongoing evaluation of Test Schedules
Source: ASIS International
Page 21INFORMATION SYSTEMS
Business Continuity Plan - Steps
Readiness
Preven
tion
Recover
y
Resumptio
n
Response
Test & Train
Evaluate & Maintain
Evaluate and Maintain Objective: Keep the BCP relevant to the Organization using a rigorous maintenance and evaluation programs
Tasks:• Develop BCP Review Schedule
• Risk Assessment
• Sector/Industry Trends
• Regulatory Requirements
• Event Experience
• Test/Exercise results
• Develop BCP Maintenance Schedule: Following are
examples of procedures, systems, processes that may
affect the BCP• System and Application software changes • Changes to the organization and its business
processes• Critical lessons learned in testing • Change to external environmentSource: ASIS International
Page 22INFORMATION SYSTEMS
The BCP at DeKalb County
Page 23INFORMATION SYSTEMS
Status as of end of Yr 2005
• A Gap Analysis was performed using a set of 25 questions that reflect the industry standard for a comprehensive BCP program• Result: Based on the study, it was estimated that the BCP practices, processes and
infrastructure are operating at a maturity level of 35%
• County operated in NT domain, with a security rating of 2.0 compared to Federal guidelines of 8.0 in a scale of 0 to 10
• County operated under individual departmental budget creating isolated IS operations, multiple servers and storages, lack of policies and overall fragmented IT management
• County operated under multiple operating systems (six OS), older versions of applications with minimum data validations
• Multiple departments depended on single employee knowledge both in business processes and applications knowledge
• Absence of awareness across the departments regarding recovery processes, BCP, security and depended on reactive responses in case of any business disruptions
Page 24INFORMATION SYSTEMS
Actions taken since Jan 2006 • Immediate plans were developed to upgrade computing environments
• Applications upgrades for Court Systems – Banner • Application upgrade for PeopleSoft Payroll System• Hardware upgrade for Kronos clocks for time recording • Design and implementation of Active Directory and migration to Windows 2003• Clustering firewall and enhancing security servers• Clustering Email Exchange and implementing archiving technology
• Consolidation of Servers • AIX (Unix) Servers was reduced from 11 to 1 enterprise class P590 Server• Introduced failover mechanism for enterprise applications• Consolidated storage into DS 8100• Windows Servers are being reduced from 160 to 15 as follows
• Implementing SQL Cluster • Implementing File Server Cluster
Page 25INFORMATION SYSTEMS
Issues impending BCP• Currently the systems are isolated based on departmental needs and legacy
AIX Windows
StorageBackup
Different Offsite Tape Storage
C
Windows
StorageBackup
Windows
StorageBackup
Windows – file server
StorageBackup
Issues:• Most of the file servers, SQL Server based systems are configured into one
machine with no failover, no disaster recovery mechanism
• Multiple storage devices, locations will be a challenge to move into a consolidated DR mechanism
• Multiple tape devices and tape storage locations is a challenge to recover in case of emergency
County IS Police Water and SewerGIS and Other
Page 26INFORMATION SYSTEMS
Consolidation of Computing Resources• Currently the systems are isolated based on departmental needs and legacy
AIX Windows
StorageCounty IS
Backup
Different Offsite Tape Storage
C
Windows
StoragePolice
Backup
Windows
StorageWater and SewerBackup
Windows – file server
StorageGIS and Other
Backup
AIX Windows
StorageColocated
Backup
A
Page 27INFORMATION SYSTEMS
Business Continuity – multiple level• Build the West Exchange site• Implement the Consistent Backup mechanism
AIX Windows
StorageCallaway Bldg
AIX Windows
StorageWest Exchange
Synchronized
Backup Backup
Offsite Tape Storage
Distant Recovery Site
A B
C
D
Page 28INFORMATION SYSTEMS
Next steps towards developing BCP
• High powered Planning Committee has been formed
• Currently “Readiness” phase of BCP is in process
• Initiatives are in process to consolidate Applications from multiple environment
• Cross training of 311 call agents with 911 call agents