Wendy Tegart, Provincial Director Service Management Jill Robert, IT Strategic Partner e-Health 2013...

23
Wendy Tegart, Provincial Director Service Management Jill Robert, IT Strategic Partner e-Health 2013 Challenges & Opportunities TEC Talk Service Interrupted… AHS Experience with IT Major Incidents & Clinical Involvement

Transcript of Wendy Tegart, Provincial Director Service Management Jill Robert, IT Strategic Partner e-Health 2013...

Wendy Tegart, Provincial Director Service ManagementJill Robert, IT Strategic Partner

e-Health 2013 Challenges & Opportunities TEC Talk

Service Interrupted…AHS Experience with IT Major Incidents &

Clinical Involvement

2

Service Interrupted… TEC Talk

www.albertahealthservices.ca

Faculty/Presenter Disclosure• Faculty: Wendy Tegart & Jill Robert

• Relationships with commercial interests:– Grants/Research Support: Not applicable– Speakers Bureau/Honoraria: Not applicable– Consulting Fees: Not applicable– Other: Employees of Alberta Health Services

Nothing to Disclose

3

Service Interrupted… TEC Talk

www.albertahealthservices.ca

Alberta Health Services Overview1

Major Incident Process

Communication Approach

Clinical Involvement

Next Steps

Questions

Major Incident Roles

2

4

5

6

7

3

Agenda

4

Service Interrupted… TEC Talk

www.albertahealthservices.ca

Responsible for delivering health services to the 3.8 million people living in Alberta, over 661,848

square kilometers served

Annual Service Volumes (2011-12)

Alberta Health Services (AHS)

Acute Care• 2,029,191 Emergency

Department Visits • 376,115 Hospital Discharges • 2,602,384 Total Hospital Days • 50,099 Births • 99 Acute care hospitals and 5

stand-alone psychiatric facilities

Primary Care• 104,704 Home Care Clients • 766,146 Health Link calls • 393,964 EMS Calls/Events

Alberta Health Service Overview

5

Service Interrupted… TEC Talk

www.albertahealthservices.ca

• Largest Employer in Alberta, 5th largest in Canada◦ 100,000 employees◦ 7,000 physicians◦ 120,000 network IDs

• Scope of AHS-IT◦ 1,514 production apps (163 critical)◦ 34 data centers◦ 4,721 servers (physical and virtual)◦ 75,000 workstations◦ 48,000 tickets generated monthly◦ 550 concurrent users in ITSM tool◦ 1,300 IT Staff (+ outsourced partners)

AHS Scale of EffortAlberta Health Service Overview

6

Service Interrupted… TEC Talk

www.albertahealthservices.ca

Context to Current Realities• Complexities of Electronic Health Record in Alberta• Local vs Provincial IT service delivery• Given the complexities of the AHS IT landscape, aging and varied

technical infrastructure and critical service requirements to support patient care...

“Downtimes happen...”

How do we minimize organizational and clinical impact and provide robust support when the technology fails?

Major Incident Process

7

Service Interrupted… TEC Talk

www.albertahealthservices.ca

Super Bowl 2013 – infamous power outage

Major Incident Process

8

Service Interrupted… TEC Talk

www.albertahealthservices.ca

What is a Major Incident (MI)? IT has a provincial Incident Management Process to manage all Incidents.

When an Incident is of a certain scale, scope, or impact, a “Major” Incident is launched.

The goal of the Incident process is to return an IT Service to operational status.

Throughout AHS-IT, we employ this common process to ensure that major IT service issues are quickly identified and appropriately responded to.

The purpose of the MI process is to supplement the Incident process with additional resources, escalation, communication and record keeping.

Major Incident Process

9

Service Interrupted… TEC Talk

www.albertahealthservices.ca

Is this a “Critical” Incident?

Urgency and Impact must both be High to create a critical incident. Critical Incidents must be escalated to the IROC immediately. Critical incidents are:

– a major outage affecting a large number of customers– an essential service and/or a business unit where there is no available

resolution or work around to provide a return to business operations

Must also consider:– Patient safety may be at risk or reduced effectiveness of patient care– The safety of AHS staff and personnel– Impact to confidentiality of data, or reliability of data– Degradation of a service including data, applications, or infrastructure.– A Senior Admin from the business is requesting a Major Incident be declared

(requires immediate escalation to IROC)

Major Incident Process

10

Service Interrupted… TEC Talk

www.albertahealthservices.ca

Is this a “Critical” Incident? Urgency

Major Incident Process

11

Service Interrupted… TEC Talk

www.albertahealthservices.ca

Is this a “Critical” Incident? Impact

Major Incident Process

12

Service Interrupted… TEC Talk

www.albertahealthservices.ca

Priority

Major Incident Process

13

Service Interrupted… TEC Talk

www.albertahealthservices.ca

Major Incidents by Month

Major Incident Process

14

Service Interrupted… TEC Talk

www.albertahealthservices.ca

IT Major Incident RolesAn IT service Incident is typically managed by the IT Service Desk and/or a specific IT Service team. When an MI is initiated, some additional resources brought in include:

IT Incident Response On Call (IROC)This is a group of IT Directors who share an On Call responsibility for MI’s. Once contacted, the

IROC is responsible for managing the MI Process so the Service Desk and Service team can concentrate on resolving the Incident.

IT Security & Compliance On Call On Call IT Security staff to respond to MI’s with a security component.

IT Senior Leader On CallThis group of IT senior leaders is available to provide additional guidance and authority if/as

required by the particular MI.

Problem ManagerChair and facilitate communication bridge meetings. Notify IT staff of updates.

Major Incident Roles

15

Service Interrupted… TEC Talk

www.albertahealthservices.ca

Clinical RolesNot all MIs require the engagement of clinical experts, but when required these roles provide context to clinical impact and urgency

Clinical InformaticsThis is a group of Physicians and non-physicians

Clinical Operations Administrator On-callOn Call AHS leaders including Executive Directors and Site Administrators. May provide front line

resources to support in downtime and reconciliation efforts

Senior Leadership On-callThis group of AHS Senior leaders include Facility Medical directors and VPS

Health Information ManagementHealth Record Management experts with data and record integrity expertise

Zonal Emergency Operations Centres (ZEOCs)Tied into Emergency Preparedness

Major Incident Roles

16

Service Interrupted… TEC Talk

www.albertahealthservices.ca

Bridges Types (conference calls)Technical Bridge• Part of the Incident Management process, as is initiated independently of MI process• Opened when collaboration by several parties is required during incident resolution

activities

Communications Bridge• Launched by IROC to bring the right stakeholders together to identify the problem and

direct its resolution.• Problem manager assists by recording chronology, participants, decisions and results• Directs communications within IT and the user community

Clinical Bridge• Usually chaired by a Clinical Informatics physician

Communication Approach

17

Service Interrupted… TEC Talk

www.albertahealthservices.ca

MI Heads Up NotificationIncident Ticket 12345

Start Date and Time

Please be aware an MI has just been declared for <Service>.

Full impact is still being assessed but at this point we have identified the following stakeholders and groups as affected by this issue: <groups and stakeholders>

If your team is directly or indirectly responsible for this Service, please attend the appropriate bridge calls set out below.

Conference Bridge Information

Communication Bridge: <number> Passcode: <number> Start Time: <time> Technical Bridge: (if applicable/tbd) Passcode:(if applicable/tbd) Start Time: <time> Clinical Bridge: (if applicable/tbd) Passcode:(if applicable/tbd) Start Time: <time>

Communication Approach

18

Service Interrupted… TEC Talk

www.albertahealthservices.ca

Communications to CustomersIT - Service Issue Information

Message may be sent to users of IT Services. In relation to unexpected/unplanned service issues. Say who is this information is intended for / pertains to. Speak in terms the customer will understand. Briefly and directly tell users what is happening and what impact they will experience. Note that IT teams are working to resolve the issue and restore Services. Acknowledge the issue/inconvenience and provide contact information for the relevant zone/FHE Service Desk. If appropriate, state that an update will be provided within a specific timeframe. Replace all text in this section with pertinent information. Review the notifications guide if there are questions on when to use this format.

Impact Summary Clearly state what, from the users perspective, is not working. Also set out the specific locations affected by this Service Issue.

NOTE: Any exclusions or caveats to what you've stated above regarding this Service Issue. Replace all text in this section with pertinent information. Review the notifications guide if there are questions on when to use this format.

Communication Approach

19

Service Interrupted… TEC Talk

www.albertahealthservices.ca

MI Root Cause Code DefinitionsCause Code SummaryApplication/Software Bug The failure is caused by a problem within the packed software itself.

Communication Failure is caused by a missed communication.

Data Unexpected or corrupted data elements caused the failure.

Environment The failure is caused by an uncontrolled element of the physical world where redundancy would not have reasonably mitigated the effect.

Equipment Failure due to age, malfunction or fault in the physical equipment where redundancy would not have reasonably mitigated the effect..

IT Third Party Vendor Root cause lies with the vendor providing a service.

Process Missing or undeveloped process caused the failure. There was an oversight in the process; a branch of the process isn’t properly developed or missed entirely.

Security An IT Security failure caused the issue.

Training The failure was caused by lack of understanding, incorrect qualification or insufficient training.

Other A mistake was made where existing process, if followed correctly, should have avoided the failure.

Unknown Root cause undetermined.

Communication Approach

20

Service Interrupted… TEC Talk

www.albertahealthservices.ca

Clinical Support During a MI

• Transparency• Communication

– Understanding and translating the clinical impact– Timely and frequent “clinical speak” communication about

the incident and immediate risk mitigation measures• Support

– Robust downtime procedures owned by clinical operations– Bedside to boardroom engagement and support

Clinical Involvement

21

Service Interrupted… TEC Talk

www.albertahealthservices.ca

More than clinical involvement...it’s about relationships,

partnerships and supportingsafe patient care!

Clinical Involvement

22

Service Interrupted… TEC Talk

www.albertahealthservices.ca

Next Steps• Continuous improvement per incident review• Develop service improvement plans overall driven by business

requirements• Examine different scales of MIs and support requirements• Leveraging the successes of the MI process to other risk areas • Continually examine clinical business risk tolerance/value and

architecture of information systems• Simplify – application consolidation, migration to a provincial patient care

platform and large scale reliability/redundancy

Higdon’s Law1. Good judgement comes from bad experience.2. Experience comes from bad judgement.

Next Steps

23

Service Interrupted… TEC Talk

www.albertahealthservices.ca

[email protected]

[email protected]

Insanity: Doing the same thing over and over again and

expecting different results. ~ Albert Einstein

Questions?

Comments / Questions